Measuring Educational Progress 



A STUDS' OF THE NATIONAL ASSESSMENT 


Measuring Educational Progress 

WILLIAM GREENBAUM 

Harvard University 

with MICHAEL S. CARET, M.I.T. 
and ELLEN R. SOLOMON, Harvard University 

MLSU- CENTRAL LIBRARY 

1 
1 

83194CL 



including 
a response from 

the Staff of the National Assessment of Educational Progress 


McGRAW-HILL 

New York St Lows 
London Mexico 


BOOK COMPANY 
San Francisco Dussefdorf 
Sydney Toronto 



Copyright © 1977 by McGraw-Hill, Inc All rights reserved Printed >n 
the United States of America No part of this publication may be 
reproduced, stored m a mneval system, or transmitted, in any form or 
by any means, electronic, mechanical, photocopying, recording or 
otherwise, without the prior written permission of the publisher 


Library of Congress Cataloging in Publication Data 

Greenbaum, William 
Measuring educational progress 
Bibliography p 
Includes index 

1 National Assessment of Educational Progress 
(Project) 2 Educational surveys 3 Education — United 
States 1 Caret, Michael S , joint author II Solomon, Ellen R-, 
joint author III Title 
LB2823 G73 379' 151'0973 76-25986 

ISBN 0-07 024285 2 

123456789 RRDRRD 754321087 


Thc J* ,, °" (° r ^ book wcr c Thomas Quinn and Cheryl Hanks, the designer 
was then fceham, and the production supervisor was Milton Heiberg It was set 
in Baskerville with display lines in Bookman by National ShareGraphics Inc 


Printed and bound by R, R Donnelly & Sons Company 



CONTENTS 


Fwrvord rti 

Pttfatt si 
Acbiou Mrtntnls »p 

PA RT 0\f. a stud\ of toe national assessment 

1 Introduction J 

2 NAEP s Objectives and Organisational Development <9 

3 Dividing Knowledge into Subject Areas 20 

4 The Subject Matter Objectives 43 

5 The Assessment s Exercise* 57 

The Exercise Development Process 

Technical Aspects of the NAEP Exercises 

The NAEP Exercises and Criterion Referenced Measurement 

6 NAEP s Measurement of Background Variables 107 

7 The Sampling Design and Exercise Packages 132 

8 Reporting the Assessment Results HO 

9 Conclusion Past and Future Uses of the Assessment IS! 

10 Epilogue Social Indicators and the Reform of Education 178 

PART Two RESPONSE OF THE NATIONAL ASSESSMENT 
OF EDUCATIONAL PROGRESS 


Goals and Accomplishments of the National Assessment 1969-1975 



CONTENTS 


vi 

2. Choices Faced by NAEP’s Planners 198 

3. Goals of the Assessment and Progress toward Achieving Them 

4. Conclusion: Comments on the National Assessment 222 


Selected Bibliography 231 
Index 235 



FOREWORD 


The 1867 Act establishing the U S Department of Edu 
cation specified that its purpose should include collecting information which 
Mould "show the condition and progress of education in the several states and 


tern tones 


In 1963, in the interest of a fuller implementation of this provision of the 
law, the then Commissioner of Education, Francis Kcppel asked John Card 
ner, the president of Carnegie Corporation, whether the Corporation might 
be willing to support consideration of the feasibility of establishing some kind 
of system to measure the educauona 1 level of the United States population 
presumabl) on a recurrent basis The Corporation responded favorably and 
m December 1963 and January 1964 sponsored two meetings of educators. 


test experts, officials, and lay persons to explore the questions of whether, and 
if so how, some kind of national testing program should be established The 
meetings led to the creation of the Exploratory Committee on Assessing the 
Progress of Education (ECAPE), chaired by Dr Ralph Tyler, Director of the 
Center for Advanced Study in the Behavioral Sciences As the plans lor 
implementing the assessment proceeded, ECAPE became CAPE, and then in 
1969 CAPE was adopted as a project of the Education Commission of the 
States and became the National Assessment of Educational Progress (NAEP) 
Support for the development of the Assessment came in a series of grants 
from the Corporation, from the Fund for the Advancement of Education, the 
Ford Foundation, and from the US Office of Education The current oper 



FOREWORD 


hcnTTpa " fU " ded ty thc ,edera ' £ overn ment (ortgtnal.y by 

° and n ° W by <hc Na “ onal C “ ter ,or Educauonal 

S d w ,^T C 3 SePara ‘ C bra " Ch ° f ‘ he De P^n,en, of Health, 
Education, and Welfare) Camegte Corporation's total contnbut.ons to the 
development of NAEP amounted to S2,432,900 
Given that amount, coupled with the decree nf 
■n the planning and development of NAEP and .h 7 par,1C,pa,lon 

the protect itself a.m , , . anc * potential importance of 

a. on whTnt ” 2 lo S lcal «** *** <°< 

fund ,:t Id ,oTpo dec,dcd ,n 1971 ,o «■“■* a 

project the Corporatton had supped"! nTpnUg^ ”1 T ^ maJ ° r 
missioned the Center for va . , * npnI the Corporation com- 

-School of 

a review and evaluation of the National 7” 3Vld K ‘ Cohc "' to undertake 
»d durtng which the Corpora , , al ’ nal ^m'nt-parttcularly °f the pen- 
deemed wtth learm^~ h e ,' nV ° VCd SmCe ,h = -re 

a >— the Assessment, then broad h , ° Wn pr °«-“ - ™U as 

— - -1. as wha, ^Xt ar , ge " ^ '° “" S ' d " " hat ™S»' 
and S23.984 was allocated for ^ D l!ua '“»»as ,o take about halt a year. 
>ty William Creenbaum and hls 7?' CValua "° n was carried out 

■non They had the assistant ol con ^ es ’ M.chacl Garet and Ellen Solo- 
Tl “ valuation was comll ^ Where 
attention is given to the quality of the o 19?3 AI,ho,, Sh a good deal of 

““7 01 > h ' sampling des,gn NAEp a " d ‘° ‘ hc " ature and ™Pl- 

detailed technica 1 considerauon t , h “ *° «- -.nation ts no", a 
thoughtful analysis ol the expe<« u ' ^7"' In!t “ d « - primarily a 
d f ,S " 01 th ' Assessment “"I “ Wh,ch ** ^nd the 

tZ, Wha ' of ,L ^ ,0n * thc lmp hcations of that 

£7""" record of some of the d's^.T?™"' ^ -fence „ ma ,n.y 

more acLrtfc doCUm ' n,at 'on from NAEP .^fTT '° the Crea,,on °< ,h e 
stand or [all thc m( ormed laype„o„_ an u ^ argument is therefore 

rr “ 

eonceptions, a„d7 ^7 n " mbCr “““t™ 7,7 a 77 mad ' 

-* "*■* impressed by the d ,« V"— eon, ^47^7"' 7" °' 

-ZS.Z 



FOREWORD 


ix 


sornbl) deliver The valuators caution that either the expectations or the 
design, or perhaps both, should change if this mismatch is not to become 
increasing!) troublesome 

This stud) was commissioned as a repon lo the Corporation, and it there 
(ore had the rcsponsibiht) of dectdmg host it was to be used and to svhom it 
should be made avadable It lelt as a matter of prmctple and of foundatton 
aecoumabtltt) that the products of its support should be open to public sen. 


An esafuatton, hossescr. both b, del, muon and by .mention shoudn.be 
a neutral document 1. has to take a stand, to make judgment If . n 
esaluatton ol an, real hie en.erprtse, those judgments can hate anitnpae. on 
other peoples’ real tnter^ U £ NAEP had ns origin, ^ ^ ^ 

support, it »as the product foundatIons and the government had 

connected with the foundation supported by the 

gne„ support B, th ' W th^ull coopera, ton of the 
gosernment This esaluatton n t „ 0 uld like to commend them 

NAEP stall and ol the other landing *»-» J uouWhke ^ ^ ^ 

warml) lor their willingness lube “ pl lhcm t0 posable crincism 

not be participants and >el " 1C d available immediately 

The Corporation fel, .ha, the «P» * f° ™ |d have . chance to take 

to NAEP and to ns other sponsors so h J c and so that they 

adsantage ol an, suggesttons they to* « ^ ^ lnd „ correct 
would hase an opportunity to respo and hr their April 1973 

The report was sent to the C or P° ra I ding ro l t ,n developing NAEP 
meeting £l P h Tyler, who had was mvlted ,o 

and who then headed its Ana ysis m the rcp ort It was clear 

j„.„ William Greenbautn tn leading ^ ^ ^ Corporation 

that there was disagreement on a The Corporatton felt tha 

encouraged NAEP to prepare a writ P blie bu , we also felt 

th e report should be made available M ^ ^ . response rom 
in fairness to the serious issues NAEP , reaction was P™P a 

NAEP should accompany it A i package was then 

— - -zzsi 

one of .he tot ^ 



FOREWORD 


education," it is itself an evaluation, and it is inherently controversial Its 
future directions and the future of other expressions of our national desire to 
know how we are doing and how we could do better-are matters of eont.no 
g debate. I hope that this publication will make an interesting and con 
stmctive contribution to that debate B 

a Mmnbut I .o h °r 11131 rePOrt may Pr ° VC ,n ' crcstm S a "d useful not just as 
ZItcZir, "f T*" 5 2 *“ and ° f *» Assessment but 
m granlTo i!' 3 7 d , aU ° n ’ S ^ and, perforce, 

about the role of th ' 77 ^ * g0od deal ,rom ,he re P° rt 

launching „u ml*! and reality inThe 

I expen Lt we vlll leamTlT T”” ' he procEses of evaluation itself 
the response are received ' ””” ** WayS wh,ch tlm re P ort and 


Alan Pifer 
President 

Carnegie Corporation 



PREFACE 


Evaluation seems to spawn further evaluation, but so 
long as essential actions arc not forestalled by the process, it seems a healthy 
one The National Assessment of Educational Progress fNAEPJ was created 
to evaluate the progress of education in the United States, then we were 
ashed to conduct a small scale assessment of the Assessment, and now there 
may well be others who will evaluate the purposes, methods, evidence, and 
conclusions of our evaluation Redundant as such a process might at hist 
appear, such evaluation (and hopefully self evaluation as well) may be p re 
requisite to a more complete understanding of social policies and social indi 
catorc 

Several aspects of this evaluation deserve mention here. First, as is de 
scribed ;n more detail m the Introduction, it seemed proper to evaluate the 
National Assessment on the basis of several quite different criteria Most 
significantly, we have examined the Assessment’s purposes, methods, and 
conclusions not only in terms of the Assessment’s own definitions but also in 
terms of other possible and reasonable approaches to defining and measuring 
the nation’s educational progress Both these sets of criteria seemed necessary 
to a full evaluation In the report we have tried to be explicit about when we 
arc measuring NAEP against its own definitions of its purposes, methods, and 
conclusions and when we are measuring it against approaches which it never 
considered or adopted For instance, the list of the Assessment’s objectives and 
subobjectives which appears in Chapter 2 includes only NAEP’s purposes as 
they appear in its early planning documents 



PREFACE 


A second aspect of this evaluation concerns methodology Efforts were 
made to approach the evaluation through as many information sources as 
possible All early correspondence and transcripts of planning meetings relat- 
ing to ‘1* development of the Assessment were reviewed All available docu- 
Dumlrt' 1 "? Y K ^ P bet "’ £Cn 1968 and 1974 ' Inclu dmg statements of 
mXLnd n bT' 1 " ^ maU " ““ • «« -rases, reports on 

mTntiesJl ? a “° nS — d Similarly, outside com- 

Extent “ r ahd ‘ ty and ' 1,lllly ° f A— ■«« result were examined 

fot a2ZTLTrr duct f w,,h naep po,,cy ad — ■ 

Interviews were also cond 7T U “ Spn " g and late summer of 1972 
tistics, the branch of the it” 7 o' Na "° nal Cemer ,or Educational Sta- 
ble for monitonng the Nm.on'al 5"? °' Educa “™ ( U SOE) responsi- 
1973 our P rehm,na„fmd,l? Asscssme "‘’ s ““-mes In late 1972 and early 
related personnel at USOE S foTcnt PrKem F d h 0 * 11 NAEP offlc,als and the 
was invited to respond toTe eval” ”" *“"* ^ Na, '°" al A ~ t 

a procedure which we feel ,s ln the , rCP ° r ' ’ he pages o{ ,hls h 00 ^ 

Third, there is the issue of the 01 ,airness a "d complexity 

apparent from recent experience wTt'h" 8 1 PUblu:a, ' 0n of ,hl! book I‘ >* 
no matter how accurate and useful P ° a " d program evaluations that 

■".=n,,.,nd,„ g! ,a„d reco J^2fr' CUl , ar ' Va ' Ua, '°" m ' gh > b =’ the 

distortions depending on how and whe j f ” T°" ' U " dcrg ° 
was our judgment tha, this evaluate"'" h eVa ' Ua,, ° n 15 d ‘“cm,na,ed I. 
h ' pertinent staff at USOE had an e i A ? r ° V ' m ° St U5 ' ful ,f NAEP and 
perhaps act upon the substantive asoec ", 1 P ' n0d °' “ me to dcba,e a " d 
"e assumed that broader political deb , bcf ° re 1,8 Publication 

would and shou,d laTer t<Lr “7 l <*** Assess 

pliable help ,„, onn th ^ aad ‘hat our report would be made 

^paranoru of ih 5 *” S ° Cia5 P° ,1C ‘« will be enha j ether the chances of 
In thu case we°bel SUbStant, ' e and more pohuLl 7 dcllbcrat cly staged 
"hieh might well ha^ll' ‘ ha ' USOE ha > matte exTenn ^ °' P ° l,Cy dC ' 
eomemporaneoui c h ,m P oai blc in an atm t V ' UK o( thc rc P°rt 

charge, fr„ m q, ^ “‘mosphere of constant and 
the same „mc, while NAEP 



PREFACE 


xm 


his had to respond to numerous hard questions and criticisms ftom USOE in 
the last two y ears the relame lack ot direct Congressional criticism may have 
delated a serious NA1 I> response to the larger questions raised by the report 
11, e mam issues raised in the report remain as crucial to the Assessment 
today as they were when the first draft was circulated some two years ago 
\\ here there in., have been changes tn operational detailsor 
Assessment the tnelusion ol a response provided to 197 dlowsNAEP 
mention them and to tale issue w„h for „s 

Fourth we ^ * h ' 

openness in making asailabie ai nrecedent and one 

de\ clopmcnt of the Assessment \N e e is^ ^ pub , |C might follow 

which other project tmtlMng »8«ro _ o[ the Assessment 

\\ ithout Carnegie s in house against its own state 

might well have been restricted to measuring cv ,dcnt in the 

men. o, purposes a, the star. o: ~ 
repon itself the use ol '■'•<“ > „p«;tations that surrounded the 

standing or the range ol po»‘ ' 1,1 1968 and grca tly enhances our 

creation ot the Assessment (mm 19“ , B we II as of the Assess 

knowledge ol decision making processes g 

tnent in particular s quite critical ol the Assess 

Finally, .1 our evaluation ^“ 0 " a ' y m5 arc olfered m a construe 
men., t. should be remembered that rti and , c chn,cal issues 

me spirit and that they are comme ducted the evaluation with 

no, individual people «e must stress tha alm „s, a decade having 

upon this knowledge m the future W|lham Greenbaum 


Cambridge 

December 1975 



ACKNOWLEDGMENTS 


Thu evaluation of the National Assessment of Educa 
^onal Progress was supported by a grant from the Carnegie Corporation 
The Corporation’s willingness to support an outside evaluation of one of its 
own major projects is commendable, as is its making available to us virtually 
all significant documentation relevant to the life of the Assessment We ac 
knowledge too the cooperation of both the leaders and staff of the National 
Assessment and its administrative sponsor, the Education Commission of the 
States And we are grateful for the aid we received from the staff at USOE s 
National Center for Educational Statistics, the federal agency responsible for 
financing and monitoring the National Assessment 

In preparing our evaluation, we were given important help by several 
people, this report was strengthened by their suggestions but of course they 
are not responsible for its limitations Nikki Smith, of Harvard University, 
and Elizabeth Nardine performed the analysis of the Reading and Literature 
objectives and the item analysis of the exercises in those subject areas David 
K Cohen, chairman of the Educa uon and Social Policy Program at the 
Harvard Graduate School of Education participated in crucial early discus 
sions of approaches to such an evaluation Stan Bolster, professor in Learning 
Environments, Harvard Graduate School of Education contributed to our 
understanding of the historical context of curriculum development during the 
last two decades 


xv 



Measuring Educational Progress 



Measuring Educational Progress 



PART ONE 


A STUDY OF 

THE NATIONAL ASSESSMENT 



1 


INTRODUCTION 


In recent decades there has been increased interest in 
establishing s> stems of social accounting and evaluation to measure progress, 
or its Jack, in policy areas such as education and health If society ls to 
become more self-aware and improve itself through use of social accounting 
and evaluation, however, it must first become more self conscious about these 
processes themselves The potential, limits, and negative side effects of these 
processes must be more fully understood, for the sake of both those who 
en g a S c Ml such work and those who enjoy or suffer its consequences It is 
essential to explore the conceptual and political limits of seeking change 
through improved information, recognizing the tension that emerges from the 
interplay between the "softness” of reality and the social sciences and our 
urgent need for “hard” information on which to base decisions It is also 
necessary to think more deeply about the effects of raising or lowering evpcc- 
tations for what can be accomplished through social accounting and evalua- 
tion, and therefore for the society m general Social scientists must avoid 
raising societal expectations for systematic change too far beyond those that 
can be fulfilled, or lowering expectations to such a point of “realism’ that 
sight of society's ideals is unwittingly lost Moreover, serious consideration 
must be given to the differences between evaluations which can produce 
relatively immediate and real improvements versus those that can produce 
meaningful symbols which might lead to progress only in the future, versus 
those which have little direct utihty or positive symbolic value but which 
become excuses for existing inadequate conditions 



1 


INTRODUCTION 


In recent decades there has been increased interest in 
establishing systems of social accounting and evaluation to measure progress, 
or its Jack, m policy areas such as education and health If societ> is to 
become more self aware and improve itself through use of social accounting 
and evaluation, however, it must first become more self-conscious about these 
processes themselves The potential, limits, and negative side effects of these 
processes must be more fully understood, for the sake of both those who 
engage in such work and those who enjoy or suffer ns consequences ft is 
essential to explore the conceptual and political limits of seeking change 
through unproved information, recognizing the tension that emerges from the 
interplay between the “ softness ” of reality and the social sciences and our 
urgent need for “hard” information on which to base decisions ft is also 
necessary to think more deeply about the effects of raising or lowering expec- 
tations for what can be accomplished through social accounting and evalua 
lion, and therefore for the society in general Social scientists must avoid 
raising societal expectations for systematic change too far be>ond those that 
can be fulfilled, or lowering expectations to such a point of “realism” that 
sight of society’s ideals is unwittingly lost Moreover, serious consideration 
must be given to the differences between evaluations which can produce 
relatively immediate and real improvements versus those lhat can produce 
meaningful symbols which might lead to progress only in the future , versus 
those which have little direct utility or positive sjmbolic value but which 
become excuses for existing inadequate conditions 



4 A STUDY OF THE NATIONAL ASSESSMENT 

0,1' b °° k h0pe *° make a CO " ,r ' b '“ 1 »" *0 larger effort 

and chl A„H m ° re ab ° ut the P~ of evaluation 

mire, of education— a n«h1 * i q ^ ,s in the field » or rather the quag- 
societal changes bure ' ♦ *° ** that has been rava g ed by dramatic 

v.tT„ b zz:“:, t:::r - d — - 

»nd, the task is complicated in e * P romises and cynicisms Sec- 

pabhc policy in Zen« hy ££ -y area of 
which exercises a profound effec' "T' °' ° Ur P ° ht ' Cal ■*“ 

system, but also on our capacitv tn e ^ ° n the chara cter of the educational 
system Our primary task m this Retort i" meanm S ful evaluations of that 
particular r//„rt al social account, J th “m ' he M<mumci und utility of am 

(NAEP), now a S6 milho ^ ^ atwna ^ Assessment of Educational Progress 
Education WewLmll^T T ^ UmtCd S ' al « ° l 
tmn rather than undertake a br „ ad ’° n '° Peri ° rm th,s P artl ™larevalua- 
-al accounting and ° f th = S-era, processes of 

hese larger aspects firmly m m , n d i„ r ^ a P proached “nr task with 
two inquiries are inextricably intertwin h"t ** the Rcport demonstrates, the 
*“ *"“««. politics, and sJCa, "e h ““ ‘ h = »««cUon of Amen- 
t's purposes, and it ,s at .hat same .mewcT I’'’ 8 ’' ' ha ’ NAEP Cvolved 
d «P-r insights ,„ t0 the challenge c‘ " tha * W arc to develop 
which noil be encountered by anvlim-T ** NAEP ’ “ «“ « «•*<>«= 
a ' literacy society y any future <=«om ,o measure levels o, function- 


““‘"“■ESS^scor.snn 

p ”~ - - 

PetioildSW-lpoa', 0 ?°' f c ah ' ad Wl <h II (laic 19G3!!ll a ] nd , d,SCUSS ' on5 lcad ' 
Anexirnen, opc^uo^'^i^jo^aottounccmcm ohu^fnnp^J^^' dc3! H n 

•'lihemomen, 

— ■>< 

>*>"»«h emThaZ and ~ ‘ 9M ’ “ ' h ' 

- n „ P . ,:z:zz7z:::7' mr 



INTRODUCTION 


5 


stt7rasrJSS3£5iaS 

zszzz rr.n=— 

they are exhaled by ^^Zs toplZl mflnation that 
mate goal of National Ass . process> to improve edu 

can be used to improve the e u P b use f u i about 

ca„o„ a, any and all developed or what 

what students ,£„onal Assessment aims at 

their attitudes are In brie! tnen areas ^ cducatl0 n 

providing information m one knowledges skills 

where more information is needed, the 
understandings and attitudes 

Dur ,„ g , 969 - 7 o *. 

areas — Science, Citizenship and Wn g oups mne year olds 

mately 80,000 respondents, sub lm adult group (ag“ 

thirteen year o.ds, seventeen Jl ^ were added 

ty six to thirty live) During 1971 -72 the exercises in Social 

Reading and Literature, and during exercises ,n each of 

and Music The .ong term plan . *“^" and Career and Occupation 
subject areas-the above seven plus Ma . A knowledge attitudes 

“ Development every few Z ca „ be traced over time A 

skllls , and u "f— L ^oreT'be Jand 

positive terms to the ac u rf substantial inter ^ ^ 

various of its s ^“ J ^ nse5 have ranged from Jh“ ^Assessment will be 
constituencies The P . , the trends and uses oi directly usa 

m ark so .« « ^ ” ^ent has predated no ^ aIready known 

in order to P ro - "" ~~ „ nommw 


“ m Education Conun« 

■Prank Womer ^^J-^nied,,™, P ■ 

ofthc Slates Ihereina'*"' 



6 


A STUDY OF THE NATIONAL ASSESSMENT 


and cannot be expected of the Assessment One central tssue throughout this 
Report is the meaning of the word “ultimate” in the quotation cited above 
The ultimate goal of National Assessment is to provide information that can 
be used to improve the educational process » Or more directly. How 
long will the society in general and Congress in particular spend several 
mi ion o ars per year, or more, without demanding some more direct use- 
fulness of the Assessment results’ 


THIS EVALUATION’S APPROACH 

To fulfill our charge of evaluating the performance and utility of the Na 
■onal Assessment of Educational Progress, we first had to choose from a wide 
range of possible entena For example, the Assessment could be evaluated m 
its very earliest objectives, if any could be discerned, its formal 
object, ves as they were presented m 1968-69 when the Assessment actually 
uncttoning, its operational subobjectives and means, some of which 
can now be deemed ends themselves, its un.n, ended consequences, the 
x ent to which us objectives might have been reached either more successful- 
that h 'T ex P ensive * y by otber means, the conceptual and political factors 
that helped determine its objeettves, or, finally, m terms of fundamentally 
fferent approaches that might have been taken m deeding why and how 
educauonal progress in Amenca should be evaluated For the sake of fairness 
.hi. ' ' he sub J ect ’ s complexity, we decided to take all 

Zb Y7 aCC0Um The “'""’P' here '» several kinds of entena 

ZimenT X " Cl ' ar " aWar ' nC “ °> *<= ambiguities inherent in the 
Assessment s purposes and methods 

Two limitations of this Report must be mentioned First, some of the de 
ma 7777 15 ‘ *“*"* ”“»* *"<< Procedures are bound to be 

ZuYes Ho NAEP ,r "* U ' ml ^ Ganges its matenals and pro- 

heZlVl "' ,he Z* "“.a! changes tha, have been typical of 
ZmZlIZZr r d that *" p,anncd ,hc foreseeable future do no, 
l y2c Y ” ' baSIC characteristics which this study describes and ana 

and'Zimk m0r ' ,mp ° nan,I >'' » h ''' *c study can evaluate aWr made 
“ Ca "" 0t “ readily elucidate the , nUoUons of those who f.« 
“Reread W -"«rv,ewed many key individuals, and 

Zlv olanl * ,h ° USandj °' — tratrsenprs of all the 

early planning meetings, in addition, all available ,n house memoranda and 



INTRODUCTION 


7 


correspondence have been examined And yet, no doubt, some intentions will 
be interpreted differently from the way they were experienced 
The purposes of this inquiry require an effort to ascertain not only the 
Assessment’s present efficacy but also factors that might increase or limit its 
cfficac) in the future This latter has necessitated a special concern with the 
Assessment’s past, and thus we have given attention to both its history and 
many of its technical aspects Macroanalysis of the past illuminates the 
broader conceptual and political realities that will help shape the 
Assessment’s future, and microanalysis provides a detailed picture of what 
would hai e to be done, and what can be done, to increase Jts overall utiliza 
tion At the same time, however, we also hope that several sections particu 
larly Chapters 3-8 and the Epilogue, Chapter 10, will be independently 
useful within the specialized areas they address 



2 


NAEP’S OBJECTIVES AND 
ORGANIZATIONAL 
DEVELOPMENT 


The recorded history of the National Assessment of Edu 
cational Progress began September 5, 1963, when Francis Keppel, the OS 
Commissioner of Education, contacted John Gardner, then president of the 
arnegie Corporation, seeking foundation support to develop the Assessment 
n eptember 16, a SI 2,500 discretionary grant was awarded to sponsor two 
conferences to “discuss the means of ascertaining the educational level at 
tamed through American public education ” Ralph W Tyler, director of the 
nter for Advanced Study in the Behavioral Sciences was approached by 
the Uarnegie Corporation to begin planning 

e first indication that the Assessment was an inevitability appeared in 
i e wor ing of the original discretionary grant The conferences were to 
t e means of developing an assessment, rather than the lalue or utility of 
n u n ertaking One staff member of the Carnegie Corporation picked 
up t is in ication of inevitability and in an in house memo suggested that the 
con " cnc ' ; time be devoted to discussing the larger questions of the potential 
^ort an easibility of such a project The response was to retain plans for 
the one day conference on developing the Assessment, but to precede the 
conlerence w,th an evening dinner meeting to deal with and presumably 
answer a irmatively the utility questions, so that planning could begin the 
n ^ lt tI ° lt,on ’ t^ c ,ntent of the planning session was qualified some 
" 131 C 000 crcncc Anally was called “to explore whether do elop ments in 
8 



NAEPS OBJECTIVES AND ORGANIZA TIONAL DEVELOPMENT 9 


testing and m methods of sampling now enable a fair assessment of the level 
of national educational attainment " However, this more detailed objective, 
still emphasizing technology rather than utility, did not really encourage 
serious discussion of the project’s potential worth 

The first conference took place December 18-19, 1963, in New York under 
the chairmanship of John Gardner Many of the most capable and influential 
futures in American education and social science attended From the founda- 
tions — John Gardner, Lloyd Mornsett, and Alan Pifer of the Carnegie Cor- 
poration, Frank Bowles of Ford, and Orville G Brim and David Goshn of 
Russell Sage, from institutions of higher educa, , on-Lee J Cronbach o 
Stanford, John Fischer of Columbia Teachers College, and John Tukey of 
Princeton, from the testing services and researd. ,nst.hi«-^ohn Flanagan 
of the American Institute for Research, Henry Dyer of Tes mg 

Service, John Holland of National Men. S "^. ^Tnd horn 

Rossi of the National Uniled Su.es Commissioner 

government organizations— Francis Repp , of 

of Education, Ralph Flyn. and David Edu 

Education, and William Firman of the N 

catIOn number 18, following several humorous ex 

At the dinner meeting on De be ^ ^ hls .merest .he 

changes between Gardner and PP . Educatlon would be 100 

project 1 He noted that in ,D “ r ad be( . n donc t0 honor the mandate of 

years old and that in that time asked it to report on the ‘condi- 

the Act of 1867 which created the Office ^ bne fly mentioned that 

non and progress of American e uca'ion d found „ dlf ( lc ul, to go before 
the Assessment idea arose ecause and dependable on what you 

Congress “without evidence that is clea , __ stated , hat he had 

really mean by lack of beca use of h„ “family 

brought the idea before the Garnegi 

1 The history .» this 

hisfalheoFredenckP Keppel,- 

and school systems and othetw. 
country ” 



10 


A STUDY OF THE NA TJONAL ASSESSMENT 


connections and affilial connection with Gardner,” remarking that he came 
with both “filial and disrespectful affection.” 5 

A few moments later, Ralph Tyler assumed what can only be regarded as 
his continuing public chairmanship of the Assessment He observed that he 
and John Tukey had already met and begun exploring ways of evaluating 
the nation’s educational progress. During the course of the next few hours of 
dialogue, it became clear that Tyler had already formulated strong notions of 
how he would go about developing the Assessment To summarize his presen- 
tation, he emphasized that (1) the Assessment would test general levels of 
knowledge, “what people have learned, not necessarily all within the school 
system,” (2) the tests would not be aimed at discriminating among individu- 
als, unlike most educational tests, (3) there would be an attempt to assess 
more accurately the levels of learning of the least educated, average, and 
most educated groups in the society, (4) some sort of matrix sampling system 
would test individuals only on a small number of questions but results could 
be aggregated to reflect the knowledge of particular subgroups in the popula- 
tion, (5) adults might be included in the sample, (6) stages, such as the end of 
elementary school, the end of intermediate school, and the end of high school, 
should be used in connection with specific testing ages rather than at specific 
grade levels, and (7) the effects of the tests themselves would have to be 
carefully considered because they might become standards for educational 
curricula and might also reflect on the status of particular communities. 4 

The transcript of the first conference covers some two hundred pages; for 
the moment, the deliberations can be characterized as follows First, it was 
generally assumed that more information about educational achievement 
would be useful per se. Yet, given this assumption, and the general thrust of 
the agenda and tone of the meeting, there was no sustained consideration of 
how the Assessment could be useful specijicallj Second, numerous penetrating 
questions were raised about the various objectives o! the Assessment and how 
they would be achieved, but almost none of these questions were resolved 
during this meeting, nor have they been in subsequent years. Finally, al- 
though some individuals dissented on certain aspects of the general proposal, 
the group as a whole gave its prestigious consent to the idea, and, more 
importantl>, to T)ler and Tukey’s specific conception of the idea. Indeed, 
Tiler’s first six proposals made during that first meeting have guided and 
dominated the Assessment from that time until the present. 


* Proceedings ef the Sciional Tester g Project Conference, p G 
4 Proceed.- gs ef the S’ciionaJ Tests- g Project Conference, pp 8-10 




NAEFS OBJECTIVES AND ORGANIZATIONAL DEVELOPMENT 11 


Within another month, three more major elements of NAEP’s design were 
formulated A second major conference took place on January 27-28, 1964, 
and while most of the participants had also been present at the first confer- 
ence, they were joined by several additional educational leaders James A1 
len, New York State Commissioner of Education, William Carr, of the Na 
tional Education Association, Harold Howe, then superintendent of schools, 
Scarsdale, New York, Leon Minear, superintendent of public instruction, 
Salem, Oregon, Logan Wilson, president of the American Council on Educa 
non, Robert Wyatt, prudent o( the Natrona! Edncat.on Arsoc.at.on, Ed 
ward Meade, the Ford Foundat.on, W.lhan, Golden, member of van 
out boards of d, tec, on, John Cotton, Pr.nce.on Untvert.ty Gco g 
Stoddard, New York Umverstty, and Robert Thornd.ke, Columb.a Teachers 

^e bnef mentation that wen. our to pamc, pants .» .hr, second conference 
sugared, Long other things, that the "oud.net of the nut, a. assessment 

might be as follows" 

Tests will cover all the mam areas of the curnculum readmg, 
language, mathematics, sconce, and soc.al studte. 

Testing will be limited to two or three S' YresXs'S” 

,s w,th the achievement of groups obtal „ed by g.vmg 

SSiSliSw individuals (Emphasis added ) 

, . . .n. meeting, in summarizing the first 

In addition, during the early part o b d enough to fairly 

meeting, .t was stared that the Attornment would 


reflect the aims of educauon in t e anginal tut, essentially all the 

When these three eatra proposals were ad ^ The comp lete 

most important aspects of the National Asse ^ knowledge , not just what 

list of guidelines includes (I) asSC ”“’ S , B om norm refereneng, which ranks 

people learn in schools, (2) moving a J ol measurement ranking 

individuals against each other, ' „ u all y Tnow about specific things ( ) 
individuals in terms of what they ^ * 7 '71 

testing a, different levels of difficulty to dneer^ ^ (4) employing a 

educated, average, and most «*“*“**”“£* (6) M „„g school age children 
matrix sampl.ng system (5) mdu je, a „d high school, (7) tesung 


by age near the end of elementary, 


ij an 4 1964, Carnegie fi 



12 


A STUDY OF THE NA T10NAL ASSESSMENT 


for knowledge of specific subject areas, (8) using a short testing period and 
concentrating on measuring the achievement of groups, not individuals, and 
(9) being broad enough to reflect the aims of American education and to 
avoid distorting educational curricula and making invidious comparisons 
among communities 

We repeat this list because within it are contained practically all the major 
strengths, limitations, and contradictions of the ultimate Assessment design 
We also repeat the list to express our surprise In the next five years (1964- 
1968) there were dozens of conferences and many heated debates about what 
the Assessment should be and how it should be designed But in essence, when 
all the participatory processes were done, virtually all the most important 
decisions that had been made were basically unaltered versions of those made 
by Tyler and Tukey from September 1963 to January 1964 


THE EARLIEST OBJECTIVES OF THE ASSESSMENT 

The January conference was quite similar m main emphasis and style to 
the first conference The general assumption about more information being 
useful per se was still widely accepted But, despite the lack of consideration 
of specifically whether or how the assessment would achieve its objectives, 
from the transcripts of the first two conferences we can draw some conclusions 
about what the original objectives of the Assessment were 6 Examination of 
the transcripts of these two meetings and of all the related in house memo- 
randa reveals five different types of objectives for the Assessment No single 
objective can be fairly evaluated until we have at least tried to define its 
relation to others First, and of greatest importance, were the major short term 
objectives Second were major long term objectives Third were subordinate objectives, 
proposed partly because they might themselves be useful in secondary ways 
and partly because of political expediency Fourth were important but un 
derstated testing objectives which can be viewed as major low-prrofde objectives 
Finally, there were operational objectives, which had to be met if the Assessment 
was to be implemented in the first place. 

Below we list these original objectives of the Assessment, placing them in 
what the available evidence suggests are the most appropriate categories. 


‘While the objectives are drawn from ah document* regarding the early 
planning meeting*, they rely primarily on the Sumnaiy Report Two Cemftrerxtt on a 
Isctioncl Asteisnerj of Educational Acctmrj, prepared by David Goelin at Russell Sage 
Foundation. Camegie file*. 




NAEFS OBJECTIVES AND 


ORGANIZATIONAL DEVELOPMENT 


The Earliest Objectues of the Assessment 

A Major Short Term Objectives 

1 To obtain meaningful national data on the strengths and 
weaknesses of American education (by locating deli 
ciencies and inequalities in particular subject areas and 
particular subgroups of the population) 

2 To provide this data to Congress, the lay public and 
educational decision makers so that they could make 
more informed decisions on new programs bond issues, 
new curricula steps to reduce inequalities and so on 

3 To provide this data to researchers working on various 
teaching and learning problems either to answer re 
search questions or to identify specific problems which 
would generate research hypotheses 

B Major Long Term Objectives 

4 To continue collecting the national data at regular in 
tervals so that comparisons could be made over time 
concerning national levels of achievement, performance 
in various subject areas, and performance in various 
subgroups, vis a vis themselves and other subgroups 
This would provide a census of educational progress m 
America 

5 To forestall the development of ‘ less effective or misdi 
rected ’ attempts at assessment Some backers of the As 
sessment disapproved of plans for a California assess 
ment and the national proposals of Admiral Hyman 
Rickover, both of which were less interested in reducing 
inequality than in separating elites from the average 
population so excellence ’ could be pursued efficiently ’ 

6 To make international comparisons possible once sam 
pling and testing problems could be resolved 


C Subordinate Objectives 

7 To promote concern about more meaningfully defining 
the nation's educational objectives 

8 To provide comparative data to stimulate competition 
among the states and local communities (without en 
couraging invidious comparisons) 


D 


Major Low Profile Objectives 
9 To lead a movement away from relying solely on norni 
referenced testing which discriminates among indmdu 
als and toward some form of objective or criterion 


7 See H C Rickover 


American Education. A National Failure (New York Dut 



14 


A STUD Y OF THE NA TIONAL ASSESSMENT 


referenced tests that assess how much an individual or 
group actually knows about a particular area of knowl- 
edge. 

10. To lead a movement away from current testing which 
relies largely on measuring knowledge in ways that ov- 
eremphasize memorization and that underemphasize 
actual skills, understandings, and attitudes. 

11. To encourage new modes of testing that are better fitted 
to the kinds of information being gathered and the par- 
ticular characteristics of the respondents. 

E. Operational Objectives 

12. To create an independent committee to manage the de- 
velopment of the Assessment. 

13. To develop widespread acceptance among the educa- 
tional establishment so that the Assessment could gain 
access to school systems. 

14. To develop widespread political support so that the fed- 
eral government would take over the funding of the As- 
sessment, while at the same time assuring that represen- 
tatives of state and local governments would not be too 
uneasy about the project being federally funded. 

15. To develop lists of educational objectives that would 
“fairly reflect the aims of American education” and 
serve as guides for the exercise writers. 


Each of these objectives is considered later in detail, but a few comments 
must be made at this point concerning the interrelationships among these 
objectives and the development of the Assessment Between 1964 and 1968 a 
very high proportion of time was spent trying to achieve objectives 12-15, the 
operational objectives. While this prevented leaders from pursuing the broad- 
er objectives, it was an understandable preoccupation since organizational 
and political support had to be stimulated, and the lists of educational objec- 
tives developed- This indicates one of the several ways in which the politics of 
American education limited the Assessment. 

And yet there is a much more basic point to be emphasized here. The 
preoccupation with operational objectives helped prevent the Assessment’s original leaders 
from recognizing the extent of their cum differences with each other. These were not so 
much differences m priorities as to ukal the Assessment should be, as they were differences 
m expectations as to u. hat the Assessment could be But, of course, in turn, diverse 
expectations led to diverse priorities. 

In examining the first four major objectives and the transcripts and corre- 
spondence relating to these, it seems clear that Keppel, Gardner, and Tyler 



NAEP'S OBJECTIVES AND ORGANIZATIONAL DEVELOPMENT 15 


assigned similar top priorities to objectives 1 and 4, the immediate and long 
term collection of censuslike data on the strengths and weaknesses of Amen 
can education The prionties with regard to objectives 2 and 3 were also 
strong and fairly close, at least between Keppel and Gardner, although, given 
their respective positions in the government and in the foundation it is not 
surpnsmg that Keppel showed a greater desire for politically useful data 
(objective 2) and that Gardner showed a slightly stronger desire for data 
helpful to researchers (objective 3) 

The truly significant differences occur, however, with regard to expecta 
tiom While Keppel and Gardner believed .hat Tyler's concept of a census 
might be useful and feasible, Tyler never really reciprocated by agreeing the 
Assessment could actually produce data in the short run that would be direct 
ly applicable either in policymaking or in research The point is not to sug 
gest that Tyler did not care about producing short term data for irec use 
policy or research, rather, is that as an aettve soctal scien.tst he tad. 
strong sense that such data could mi be gathered in t e s on ru 
low expectations ,n this regard, .. ,s no, surpnsmg that 
little priority to object, ves 2 and 3, to ptov.de useful data for policy maken 

and researchers , hut it is not 

At best reading this may seem to be a and sub!l ant.al 

Keppel and Gardner surely ejected some relat y ^ USOE 

returns in terms of <^ecu« 2 am 1 ^ m ^pectahons for 

expect such returns now Thu , T , er has conslst ently prom 

the Assessment is crucial And so is e produce while refusing 

ised no more than the Assessment rea is i (hat the Assessment can 

overtly to contradict other NAEP ofticia s " ° d b , smce ,he federal 

fulfill objectives 2 and 3 Tyler's stance . descriptlve census 
government might not as generously suppo g unons for the As 

but nonetheless the continuing gap tween organizational and 

sessment and ns real potential creates intense concept 

political strains 


the organizational development 
TH OF THE ASSESSMENT 

T I0fi4 John Corson of 

Following the second^fannwg^conference a jterrative 

Princeton Umve«ity nt £arly m that memorandum a 

tional plans for tne 



16 


A STUDY OF THE NA T10NAL ASSESSMENT 


1964, Corson noted that the Assessment idea had not really been tested with 
many representatives of the education establishment and that he expected 
major opposition from school administrators and other organized interest 
groups Indeed, he noted that developing “greater consensus among opinion 
influencing groups that a periodic assessment is feasible and desirable” might 
be the major task before the project could be launched 8 

Other tasks for the developmental period included identifying the educa- 
tional purposes and objectives toward which progress is to be measured, de- 
veloping the instruments for assessing the status of education, and planning 
how the periodic assessment shall be administered Corson’s paper assumed 
that eventually the United States Office of Education (USOE) would take 
over responsibility for the Assessment and would contract to have it adminis- 
tered by some nongovernmental agency or nonprofit corporation During the 
interim he recommended that a temporary committee be appointed to do the 
initial work, especially to meet with educational interest groups to test their 
opinions and convince them of the Assessment’s feasibility and desirability 
He also recommended that approximately one year later a presidential com 
mission on the progress of education be delegated to carry on the Assessment’s 
development. 

In June 1964 the Carnegie Corporation made a SI 00,000 grant to establish 
the interim committee, which met for the first time m August 1964 and chose 
the name the Exploratory Committee on Assessing the Progress of Educauon, 
which became known as ECAPE Ralph Tyler served as chairman 9 The 
committee’s life expectancy was five months, and its mandate was to 

Develop a greater consensus among influential educational groups that 
a periodic national assessment of educauon is feasible and desirable 

Develop the instruments for assessing the status of education 


* ‘Launching a National Educational Assessment,” Carnegie files, p 4 
9 The other members of the committee included Melvin W Barnes, Super 
intendent of Schools Portland Oregon, John J Corson, Pnnceton University, Paul F 
Johnson, State Superintendent of Public Instruction, Iowa, Devereaux C Josephs, 
New York Life Insurance, Roy E. Larsen, Time, Incorporated, Katherine E. 
McBnde, president, Bryn Mawr College, The Reverend Paul C- Reinert, president. 
Sc Louis University , Mabel Smythe, principal. New Lincoln School New York City 
The staff director was Stephen B Withey of the Survey Research Center, University 
of Michigan, Ann Arbor 




NAEFS OBJECTIVES AND ORGANIZATIONAL DEVELOPMENT 17 


Plan how the assessment could be administered and monitored m the 
public interest 


ECAPE held a series of conferences throughout the nation with various 
groups of school superintendents, administrators, and curriculum personnel 
While considerable interest in the hypothetical value of an assessment was 
demonstrated at these conferences, it was made very clear that opposition to 
federal control of such an undertaking was substantial This was an impor- 
tant factor leading to the conclusion that ,t was premature tc ^appoint a 
presidential commission on the progress of education, and so ECAFi.s l.te 
was extended and actually lasted through most of the development period, or 

until July 1, 1968 , 

Between 1964 and 1968 ECAPE was responsible for coord, nan g 
seeing, and evaluating the work of the various rontractors chosen .o pe fo m 
specific tasks essential to implement the Assessment In 
more time and energy was spent kindling supper, among P“' “men, 

educational interest groups Public relations articles 
were published and key school people were me u ^ (he subject 
and as members of the many review panels used ^ ^ 

matter objectives and exercises developing t eKs in ^ ^ c „ stcnce vir 
ical support thus occurred simultaneous y , j 2 million came 

tuahy aU ECAPE's funding, for .he 

from the Carnegie Corporation an a slo ooOO grant from 

Advancement of Education The only excep CDn Ierences to 

USOE to the Un.vers.ty of M.nnesota, to support X ___ ^ 

constder whether any of the Assessment exeretses m g 


On July 1, 1968 


, ..... .ECAPE dropped, he ***^^"22 

the Committee on Assessing .he ' ^“t, 70, « spent another S3 

moved into its oper attonal phase, between 1968 and 1 • moncys , b 





18 


A STUDY OF THE NA TIONAL ASSESSMENT 


the new members were recruited from educational interest groups, including 
the American Association of School Administrators, the Chief State School 
Officers, the National Association of Secondary School Principals, the De 
partment of Elementary School Principals, the National Education Associa- 
tion, the American Federation of Teachers, the National Congress of Parents 
and Teachers, the National Association of State Boards of Education, and the 
National School Boards Association 

The search for an appropriate permanent administrative sponsoring agen 
cy began in earnest m the summer of 1967 but was not concluded until the 
steering committee of the Denver-based Education Commission of the States 
(ECS) voted to accept governance of the Assessment in June 1969 It was 
clear from the outset that the project would have to be administered by some 
group that explicitly represented the interests of the education groups op 
posed to initiation of the Assessment The decision to ask ECS to take over 
clearly was designed to convince the states and others that they would in no 
way be hurt by the project In fact, when ECS was first approached in 1967, 
its steering committee voted 7 to 6 against assuming governance, largely be 
cause it was felt that the ECS interstate COMPACT, which was intended to 
promote harmonious relations among states and between educational and 
political leaders, was just building its membership and any cooperation with 
a program that might embarrass or offend some states might be injurious for 
the COMPACT itself After considerable negotiating, involving reassurances 
that very little about the Assessment would prove objectionable, ECS voted to 
accept governance of the project 

One of the many ironies concerning the Assessment’s evolution is that one 
of the ECS steering committee members who opposed administering it in 
1967 did so not out of fear that it would be too strong, but precisely because 
it was being designed so that there would be nothing objectionable in it 
Leroy Greene stated 

we are being told that since some people have diabetes, 
we can’t have any sugar in this report, and since others of us are 
prone toward heartburn we had better watch out for spices so 
unless this turns out to be pablum it is not going to be satisfactory 
We are told by some of those who have gotten up here, “We 
support this program However — ” and then the “however” went 
on to indicate that it had to be satisfactory to the point of view 
that they were representing, and I would suggest, then, we have a 
great difficulty here because, among other things, wc are con 
cemed that perhaps some will misinterpret this information 
Now, “a fact is a fact is a fact,” to quote Gertrude Stein, and if 



NAEP'S OBJECTIVES AMD ORGANIZATIONAL DEVELOPMENT 19 

we can present /acts and then they are misinterpreted it seems to 
me one of the prime requisites of an educator is to educate and if 
we can’t straighten out misinterpretations we are in pretty bad 
shape, particularly those among us who are educators If they 
cannot defend their own positions, perhaps it is because they are 
not defensible 

I would suggest if we are compelled to accept the limitations 
that are being suggested here, then I for one would move that we 
have nothing to do with it whatever 10 

Despite all the problems of building support and all the debates about 
what the Assessment should and should not be when operations actually 
began in 1969-70, the nine guidelines laid down by Ralph Tyler and other 
key backers m 1963 and 1964 were still the dominant guiding principles 
Similarly, objectives 1 through 4 (see page 13) were still intact although by 
this time the importance of differing expectations had increased among van 
ous groups who thought they knew how the project might benefit them As 
the following chapters indicate some of these expectations were in direct 
conflict with each other, while others were in sharp contrast with Tylers 
relatively humble sense of what the Assessment could accomplish 


o/ecs St*™.! Com** Denver Sep. 29 1967 pp 82 -“ 



3 


DIVIDING KNOWLEDGE 
into SUBJECT AREAS 


Once the National Assessment was a reality, EC APE’s 
next concern became the question of precisely what would be assessed This 
question was potentially a complex one, given Ralph Tyler’s early insistence 
that in order to measure general levels o! knowledge, the Assessment would 
measure what the American people learned outside the schools as well as 
within them From this broad recognition that institutions other than the 
schools are critical educators, it would have been possible for the Assessment 
to define knowledge in relatively broad terms and to focus on units of mea 
sure other than the conventional subject matter areas taught in most schools 
That it did not and instead assesses ten distinct school subjects is a matter of 
considerable interest 

Examination of the early transcripts of policy meetings and ECAPE dis 
cussions gives no exact picture of when and by whom it was decided to 
address primarily school subject matter areas Perhaps it is misleading to 
think of this as a decision at all The subject areas were inevitably present and 
attractive from the beginning Yet the transcripts reveal many more imagina- 
tive and searching conceptions of what constitutes the “knowledge” of ihe 
American people 

Though it may be impossible to say that a decision to neglect these concep- 
tions in fa\or of the simpler scheme of subject areas was ever individually or 
collectively made, the deliberations in these early meetings reveal a process 
20 



DIVIDING KNOWLEDGE INTO SUBJECT AREAS 


21 


that seems to have played an important part in other ECAPE decisions This 
process in\ okes whittling away at more complex conceptualizations until 
simpler, sometimes simplistic categories are reached It involves, too an mevi 
table foreshortening of historical perspective in favor of the more immedi 
ately expedient 

In examining some of the deliberations about what was to be assessed we 
can also see the context of political and educational issues within which the 
deliberations tool place, a context which had implications for all important 
aspects ol the Assessment the relationship between the definition of knowl 
edge and actual school curricula, the vaneues of curricula and their purposes 
throughout the nation, the hope held by many national educators that the 
Assessment would help remediate educational inequities and the fear y oca 
school officials that national norms would arise, the question o w ere respo 
sibllity lies lor different aspects of educauon for the nation s children-m 
home, the schools and other arenas within the community ^ 

While specification of the ten subject matter areas may s 
efficient way to de.ine the knowledge to be assesed 

with numerous ptoblems-polmcal and methodological as well as theotet. 


cal— many of which the Assessment 


suit is struggling with today 


THE definition of knowledge 

AS SCHOOL-RELATED SUBJECTS 

Tyler’s original statement concerning a to the „ 0 uon 

was not logically wedded to the subject ma ^ analogy, it is clear that 

construed exclusively in the schoo s e Flanagan of die 

- 7‘ y [h^ibfRe^reh^^d die F^ sslblllty hat a na,,onal 



22 


A STUD Y OF THE NA TIONAL ASSESSMENT 


recent years have shown substantial agreement on the broad gen 
eral objectives of elementary and secondary education Although 
the terminology vanes from one report to another, the general 
aims are usually stated as developing the individual’s maximum 
potentials and producing free and responsible citizens In accordance 
with most recent thinking in regard to public education these 
statements refer to individuals and not to fields of knowledge or 
disciplines 1 

Moreover, the division into subject matter areas did not necessarily follow 
from the original 1867 mandate of the Office of Education — to make an 
annual report to Congress on the state of American education As has been 
noted, it was the existence of this mandate, long unfulfilled, which Commis 
sioner Keppel used as the justification for asking Carnegie to begin to consid 
er a national assessment. As we shall see, many voices were raised during the 
early planning meetings convened by Carnegie and then by ECAPE against 
using subject matter divisions These voices presented a range of suggestions, 
from urging the project to view education as a means of developing function 
ally literate citizens to warning of the difficulties the Assessment would en 
counter if, m choosing some subjects and omitting others, it provoked the 
lobbies of excluded subject matter teachers and curriculum specialists 
Records of all the preliminary meetings indicate that these arguments were 
reviewed and that considerable time was spent discussing the inclusion of 
extra subject matter items — such as students’ self-concept, general problem 
solving ability, moral attitudes, and more general concepUons of knowledge 
But these issues never became part of the Assessment Why this was so will be 
considered later First, it is important to understand that Keppel and Card 
ner originally began to think of the Assessment within a context that had 
considerable consequences for the division into subject areas 


THE ASSESSMENT AS A MEANS TO REDUCE 
EDUCATIONAL INEQUITIES 


Although Tjler continually spoke of the Assessment as a census, it seems 
that he never really conceived of it as a device to gather information in the 
relatively disinterested or indirect waj that "census’ implies From the out 


1 “Evaluating the Effects eness of Elementary and Secondary Education 
Carnegie fitea, p 1 




DIVIDING KNOWLEDGE INTO SUBJECT AREAS 


23 


set, he made analogies between the Assessment and those indices that show 
the existence of unemployment and typhoid It seems clear that as he con- 
strued it, the Assessment would be a means for locating areas of unequal 
educational achievement, similar to the pockets of unemployment that can be 
isolated through Department of Labor statistical surveys A census of knowl 
edge as Flanagan \iewed it — one through which the society would self con 
sciously examine its talues and goals through assessing the knowledge, skills, and 
attitudes held by its citizens — was evidently subordinated from the start, al 
though several participants supported it throughout the meetings 

Perhaps such a conception as Flanagan’s would be seen as fanciful and 
luxurious at any time in history— including the present Certainly in the 
early sixties it would have been considered impractical in light of what were 
then perceived as the urgent needs of American education Powerful and 
conflicting pressures were being brought to bear to redirect greater resources 
to the education of elites and to equalize educational opportunity for e 
masses Admiral Rickover’s campaign to suslam and ' 
cream a. die top of the educational milk bottle” was to lowed by the Kenne- 
dy administration’s commitment to “churn the pale milk a 
least the consistency of the middle ’ ]ltv Q f 

For those concerned with educational mequ.ties tmprov mg the qualny of 
schooling was the obvious task a, hand In .hose days before *e Coleman 
Report, comparing levels of achievement y m parUCU , ar how 

meaningful way to prove to the coun ry a h j d to reme dy the 

positive educational 

policy 

MR HENRY S D YEi R 

gram, Educational Testing S ^ ^ do you think of 

I ask a question of Frank Uniting of just one nation 

state by state comparisons, or a y 

al index"’ , former than the latter j 

MR KEPPEL Probab y.”"'',^ still do believe in centers of 
think my response ^ tha , K „se a eompeti 

initiative, quite ™*ly ddfu»d, „ going to fuss around 

SStS -ndy. will turn ou, to be some 

where in Missouri 



24 


A STUDY OF THE NATIONAL ASSESSMENT 


MR TYLER. The types of questions you raised would involve 
some other types of breakdown 

MR KEPPEL. Of course it would — the breakdowns of the cities 
You can cut this one a half a dozen different ways 
MR DYER. But you’d have comparative data 
MR KEPPEL My enthusiasm for having a national mean can be 
restrained I am more interested in using the national competitive 
quality of it, you see, if this can be used that way to push the 
whole damned thing up 1 2 

Keppel’s enthusiasm for such a strategy led him to encourage the others 
present to do their best to develop a good method for uncovering specific 
areas of educational malaise while he took care of the politics of gaining 
acceptance for the Assessment, a more difficult task if schoolpeople felt 
threatened by pressure for “a national mean” or by the possibility that their 
own failures would be exposed 

It was within this context of avoiding antagonistic rela tions with schoolpeo- 
ple that the accomplishments of the schools became the Assessment’s focus of 
measurement Participants throughout this first meeting expressed concern 
for distinguishing between what is learned in school and what is learned 
elsewhere so that “inadequate ’ school systems could receive more aid No- 
where is this better illustrated than in the following exchange 

MR. JOHN IV TUKEY [ Princeton University] I would like to 
say, I don’t see why we have to fail to regard the television com 
mcrcials as part of the educational system in the country, if we are 
to be realistic We may not like it as a part of the educational 
system but, if this is what educates people, I think we need to be 
concerned with it 

MR FISCHER [Columbia Teachers College] Don t we need to 
distinguish between what they learn from the commercials and 
what they leam from the teachers’ 

MR. TUKE} Wliy ^ 

MR TILER For some purposes but not so as to say that this 
country has a relatively low educational level 
MR FISCHER If we want to blame it on somebody, we d need to 
distinguish 

MR TUKF } \\ hat is this responsibility to report— on the status 
and progress of education or on the school system’ 

MR hFPPEL Education * I think is the word used 


1 Prxtrd of tht \ c inal Teitng Pnjrel Cmfrrmct D re IR-19 1903 Came 

gieflo pp 12-13 



DIVIDING KNOWLEDGE INTO SUBJECT AREAS 


25 


MR FZ, FiV7’(USOE] “ in order to improve the school sys 
tem ” 

MR FISCHER If you looked at the intent of the Congress in the 
year the Office was set up, they meant the school system 3 4 * 

Thus the perceived major uses of the Assessment to some extent dictated its 
nature Although it might not be possible to isolate the effects of schooling 
completely, the Assessment was to be useful to school systems and its content 
was to be primarily school-related However, the selection of discrete school 
defined subject matter divisions did not, even then, necessarily follow Many 
persons present, including Tyler, recognized that (duration included much that 
was imparted by media other than the schools-televiston, for example-and 
were mterested in evaluating such knowledge as well Partly they had a 
genuine desire to develop a comprehensive definition of “knowledge , pa y 
they simply recognized the difficulty in separating out what was learned in 
school from what was learned in other ways , 

I. was no. unttl late ,n the meeting tha, the division into sp M « 

was broached This occurred when Gardner, who had perhaps, assumed such 

an approach all along, said the following 

Since you have brought up £££*£??£?£ 
terms of subject matter I d covered If we are talking 

about the'schools^now^ what subjects should be deal, with 
tests — every subject covered by the schools 

is interesting tha, Flanagan, who 

of Project TALENT, was indeed talking abo ^ ^ „ as 

jects In fact, he was ^ ence or mathematics, and pointed 

involved in teaching a dtsciphn , m]Xtures 0 f various skill pro 

out the difficulty of distinguishing among _,f the AssBsment 

cesses — knowledge, concepts, application of pnncip 

were to work within traditional su ject have becn somewhat set 

But after Gardner’s question, t c 1SS ” foun ders Whenever subjects or 
tied, a, leas, .» the mtnd, of the A "* o f J^ting. they are consid 
content-areas are spoken of during f ihe AssK 5 me „, should simply 


content-areas are Whether 

ered only as school defined areas Whelher 


3 Ibid , p I 8 

4 Ibid , p 102 

3 Ibid pp 100-102 



26 


A STUDY OF THE NA TIONAL ASSESSMENT 


stick to the more basic subject areas or include a wider spectrum of cumcu 
lum offerings, some of which would not be available in all schools, was vigor 
ously discussed A decision on this matter would raise the issue of the 
Assessment’s impact on the schools’ conceptions of which subjects were im 
portant If only the basics were evaluated, programs such as those m art or 
music might be seen as less important and might receive less local support 
But if the Assessment included only some of the nonbasic subjects the lob- 
bying from curriculum specialists m excluded areas might become fierce 6 
Discussion turned likewise to the Assessment s impact on the ways m which 
subject areas would be construed by the schools In fields such as social stud 
les, mathematics and language, considerably different approaches prevailed 
throughout the country Whatever approach the Assessment used as the basis 
for its tests would influence the schools and might provoke great controversy 
After the initial meetings these issues were broken down into five major 
points for those in the Assessment to keep in mind as they decided on strate 
gies for classifying subject matter areas 7 

Content of Tests 

A Range of subjects to be covered The tests should fairly 
reflect the aims of education in the United States, includ 
mg both the traditional and modem curriculum 

1 Greater cm erage of subject matter would reduce the 
impact of the program on any one field 

2 Greater coverage of subject matter might stimulate 
schools to venture into areas covered by the test, but 
which would be new to the school 

3 Greater coverage of subject matter might give more 
schools an opportunity to excel in something, thereby 
reducing the possibility of invidious comparisons 

4 Greater coverage of subject matter would lead to a 
greater danger of harmful impact (or possible lack of 
local cooperation) m those fields (for example, social 
studies) about which there is less agreement among 
schoolpeople regarding content or approach 


s Flanagan s approach in Project TALENT was lo test items of general 
information " During the discussion of the lobbying that might accompany the 
Assessment s choices of subject matter areas he pointed out that if it followed hi* 
format the National Assessment could avoid such problems 

7 Summary Report Two Conferences on a National Assessment of Educa 
tiona! Attainment ” prepared by David Gotlm of the Russell Sage Foundation Car 
negie files, p 4 




dividing knowledge into SUBJECT areas 


5 


A! the individual level, greater coverage might lead to 
more frustration whenever children have not received 
instruction m a subject covered by the test 


- " ”7 '° ' ne “ ure “fooational inequalities the Assessment 

would focus on learning that occurred the schools Subject matter divisions 
were the most obvious means for mating comparisons, and future dectsions 
n choosing those subject areas or on choosing among disparate approaches 
Wt in a subject area would depend upon possible impact on the schools 
themselves 


MORALITY, FEELINGS, AND ATTITUDES 
THE “INTANGIBLES” 

Even though school subjects were generally agreed upon as the primary 
unit of dividing the educational pie, another general area was suggested for 
the Assessment to take seriously John Holland, of the American College 
Testing Program, proposed that attitudes concerning schooling educational 
aspirations, and sell-concept be included 

I’d like to go back to what Lee Cronbach talked about, and we 
have been talking about, generally— the fact of doing the assess* 
ment of the 1870 curriculum I think one thing that we might do 
would be to ask for things which would offset this — not necessarily 
subject matter — that would suggest the school has some other 
function than teaching people that they should know such and 
such which really I have some doubt about how useful this is For 
example, the sort of item I think you might ask would be attitude 
toward school, the degree to which they like school 

Do they read on their own time’* There are lots of students who 
simply don’t read except when they are in school Well, school 
can enhance at a particular level the amount of reading done I 
think this is a reasonable sort of thing What are the levels of 
educational plan, how far do they plan to go in school, when you 
control for the background 

Another notion which 1 $ a kind of personality item but which 
might be something you might be able to get away with is stu 
dents’ self conceptions as to how they rate themselves on their 
abilities and so forth and so on There are schools which crush 
kids it would be of considerable interest whether kids in “School 
A” see themselves positively or not We have done a recent study 
which suggests that kids of high talent who go to certain kinds ot 



dividing knowledge into subject areas 


un lhriT. l, “' C,,rc '" al £ lm <mck, and then, m tiymg to line 
° e 3 ‘ hC '* a ' S lhat th ' Saho0 ' ■> 

upposed to teach, how much did the school teach, how much did 
‘ , Home teach, how much did the community teach’ What is the 
retains contribution of each of these agencies to the development 
or lack of development of these attributes 3 
The school may do a tremendous job of teaching these values 
which are complete!)’ abrogated at home, or by the community 
* ^ I certainly feel that we should get to the difficult things first 
t s easy to get achievement in the content subjects, it’s a matter of 
a test and a good, scientific sampling— but, to get at the intangi- 
ble outcomes uhich build our community is an important factor 
MRS GREENFIELD I’d like to see us begin to tackle these be 
cause I believe that the public school system is being called upon, 
more and more, particularly in the large cities of the United 
States, to assume the responsibilities that once were thought to be 
solely the responsibilities of the home Certainly this is true m the 
inner city, and I think — if we really could go to a discussion of 
whether or not as people interested in education we think this is 
something that, one, is desirable and, two, wjJI be continued, 
whether it is desirable or not — [we should ask] how, then, do you 
measure some of these Should these be part of the output 
that you expect m a school, that a child is taught to be honest in 
its simplest terms, if he doesn’t learn it in his home 3 

I’d like to see us discuss, list and discuss some of these things 
because — well, I don’t think I have to argue why in this room' 

FA THER BEHRENS Do you think the Carnegie study in process 
right now 

CHAIRMAN TYLER The study of Catholic education 3 
FATHER BEHRENS Right’ Do you think that some of the in 
struments we have used here in the Duke studies of parental ex 
pectations might lead us in our consideration of structuring an 
instrument to be used for this group 3 

FA THER CURTIN Well, it's part of the reason f raised this ques- 
tion My understanding of the study at this time is that Educa 
tional Testing Service is having a very difficult time trying to 
evaluate the outcomes, first of all, to get a correct measure o 
value, secondly, to determine whether or not the value was 
learned or not learned because of the school or because of the 
home or because of the community Does the community make a 
great contribution to the value system 3 If the child lives down- 
town, he gets one value system, if he lives in a very fancy suburb, 
he gets another value system, and the one in the fancy suburb 


But I think that we’ve then got to find out 
is the only instrument that we have in formal education 


since the school 



28 


A STUDY OF THE NA TIONAL ASSESSMENT 


schools do less than we have expected them to do, and these are 
the very schools which everybody thinks are the finest in the 
world I am sure it happens at other levels 

Well, I think it’s possible, I have been daydreaming on the side, 
here, to find some simple, crude, nonsensitive items which would 
take the heat off the subject matter 8 

Holland s proposal that the Assessment might measure areas that went 
beyond school subjects was expressed in terms of what such inclusions might 
do for the Assessment, e g , make it more modem than critics might say, help 
it pass through the controversy that might be aroused if school subjects were 
the only focus In later meetings, when the matter was raised by educators 
consulted by ECAPE, the tone was very different These educators did not 
champion the inclusion of such areas because they would help the Assessment 
become a reality but instead justified their inclusion because the educators 
regarded them as the most important outcomes of education The areas in 
volved such qualities as creative thinking, critical thinking, curiosity, lifelong 
interest in reading, respect for the rights of others and for the law, the dev el 
opment of nsk taking behavior, and positive self-concept 

As the participating educators discussed these attributes, habits, and val 
ucs, they stressed the increasingly important role of the schools in fulfilling 
what once was considered the moral function of the home and other coramu 
nity institutions Tyler seems to have been reluctant to acquiesce to the 
schools responsibility for transmitting, and especially for changing students’ 
values, in addition, he emphasized the difficulty of measuring the achieve 
ment of such intangibles,” even if agreement could be reached on their 
importance The educators concurred that “objective” standards would be 
difficult to formulate, but insisted that it would be more useful to provide 
unfinished instruments that could be improved than to leave out the mtangi 
bles entirely The tenor of the educators’ concerns and Tyler’s responses are 
well illustrated in the following exchanges 

FATHER CURTIN I think it’s tremendously important we 
gin with the difficult things to measure which are important to 
society It doesn’t make any difference, it seems to me whether a 
c i can learn the mathematical computations with great a ecu 
rac >* a lniost by rote, if he uses it to steal 

I think society is more concerned about whether he uses that 
correctly and intellectually and as a citizen in a community rath 


'Proceeding! of the Sat tonal Testing Project Conference p p 123-129 



dividing knowledge into SUBJECT areas 

, " tellectual gimmick, and then, m trying to hne 

Z£L, T . 1 V h n Se< '' alUK ’ ,hc ,deals lhat *h' Stool ” 

supposed to teach, how much did the school teach, how much did 
tne home teach, how much did the community reach 7 What is the 
relative contribution of each of these agencies to the development 
or lack of development of these attributes 7 
The school may do a tremendous job of teaching these values 
which are completely abrogated at home or by the community 
t * certainly feel that we should get to the difficult things first 
It s easy to get achievement in the content subjects it s a matter of 
a test and a good, scientific sampling — but to get at the intangi 
ble outcomes which build our community is an important factor 
MRS GREENFIELD Fd like to see us begin to tackle these be 
cause I believe that the public school system is being called upon 
more and more particularly in the large cities of the United 
States, to assume the responsibilities that once were thought to be 
solely the responsibilities of the home Certainly this is true in the 
inner city, and I think — if we really could go to a discussion of 
whether or not as people interested in education we think this is 
something that, one, is desirable and, two will be continued 
whether it is desirable or not — [we should askj how then do you 
measure some of these Should these be part of the output 
that you expect in a school, that a child is taught to be honest in 
its simplest terms, if he doesn’t learn it in his home 7 

I’d like to see us discuss list and discuss some of these things 
because — well, I don’t think I have to argue why in this room 1 
FA THER BEHRENS Do you think the Carnegie study in process 
right now 

CHAIRMAN TYLER The study of Catholic education? 

FATHER BEHRENS Right 1 Do you think that some of the in 
struments we have used here in the Duke studies of parental ex 
pectations might lead us in our consideration of structuring an 
instrument to be used for this group 7 

FATHER CURTIN Well, it s part of the reason I raised this ques 
tion My understanding of the study at this time is that Educa 
tional Testing Service is having a very difficult time trying to 
evaluate the outcomes, first of ail to get a correct measure o 
value, secondly, to determine whether or not the value was 
learned or not learned because of the school or because of the 
home or because of the community Does the community make a 
great contribution to the value system 7 If the child lives do«n 
town he gets one value system, if he lives m a very ancy u 
he sets another value system, and the one in the fancy suburb 
might be worse than the one downtown' It can go "”1. 

But I think that we ve then got to find out since the school 
is the only instrument that we have in formal education 



30 


A STUD Y OF THE NA TIONAL ASSESSMENT 


CHAIRMAN TYLER- I am sort of puzzled, because it’s certainly 
true from what we know about the acquisition of values that the 
school can become part of a consistent system of the home and 
community in helping to use the anthropological phase to social- 
ize y oungsters to develop moral values, how you behave in a social 
group, the responsibility for others, and so forth But when the 
home and the community set up quite contrary values, is the 
school an effective agency 5 Do they really try 7 Can they do much 
in that regard 7 This is one of the questions 
FATHER CURTIN That’s the question I raise, Ralph, right at 
this point 

CHAIRMAN TYLER “All the things you have been learning at 
school is sissy stuff We don’t act that way 
FATHER CURTIN The peer group doesn’t act that way 
MR. CARROLL. Respect for law, for example 1 
CHAIRMAN TYLER And, if the schools are actually undertaking 
this responsibility, Bill, how' do they do it in a way that enables 
them to transcend the influence of home and comm unit), and the 
church and school would seem to be about the only ones in some 
communities where these are going to be taught. 

AIR SPEARS We’d like to have some instruments to play around 
with, rather than somebody just develop them and not show them 
to us, and then make us take the test and then have us have to 
defend ourselv es We’d rather be a party to it. 

MRS GREENFIELD We do it m broad, general terms when v.e 
say “good citizenship,” but it’s the kind of thing that is accepted, 

I think, in the country as one of the goals of the school But a 
school that taught good English and math and didn’t teach a 
child to be a good citizen in a democratic society hasn’t succeed 
ed, and certainly it’s something that they ought not to leave to 
chance, in my view We ought to know whether we are doing it or 
not. 

MR ROGIN We have greater difficulty getung consensus on 
what that term means than we do about successful reading 
CHAIRMAN TYLER And I am also raising the additional ques- 
Uon, “If we got consensus, is it proper to say that this is a function 
of the school, alone, if the community is really concerned with it, 
and doesn’t it have some responsibility to do something about it, 
the community, roundabout 7 ” 

MR SPEARS \Ne can use the test to reflect the community back 
to the community They won’t expect us to be doing all that and, 
if we can show them up a little m their public relations, I think it 
might help 

CHAIRMAN TiLEIR First of all, if I recall some of the data from 
the Catholic education study, the parental aspirations and the 
youngster’s achievement are fairly well correlated 



dividing knowledge into subject areas 


FATHER BEHRENS Y« 

T,Z'Zr LE A ^ What ,hC adul,s d ° -1 -pec, 

mc Jfr. h , den does havc an influence, apparently But this 
means that, if we are going to do that for education generally, we 
e to get some measures of what the community atmosphere is 
what we call “good” citizenship” in a negative community 
nas quite a different meaning from where all of the citizens 
are consistent with it 

KIERNAN Take the concept of self image, say, in the ghet 


CHAIRMAN TYLER. Our committee might recommend, if you 
felt it was important, that one of the early studies to be sponsored 
and Supported would be one with some intensive students, and 
you can come into these where students would be happy to have 
this done But it’s by no means an easy job, and I am really 
raising the prior question to what extent do you school people 
think that your responsibility is to change the moral climate ol the 
communities in which your schools are located, because that’s 
really what you are asking when you talk about having the 
youngsters different from the parents or the places outside the 
school m that community 

MRS GREENFIELD I serve on the school board in Philadelphia 
and I must say that I think any large city school board has to face 
this problem, and most of them are facing it and making the 
decision they must enter into this field, which has not been a 
conventional educational field m the past But l happen to believe 
that the future of the country depends on intervening, so I don t 
think you have much choice 

CHAIRMAN TYLER I would think there’s another alternative 
Mrs Greenfield, which is to develop other community resources, 
including or in addition to the school 

MRS GREENFIELD Oh, 1 don’t think the school should do it 
alone 

CHAIRMAN TYLER The mobilization of youth not just the 
school 

MRS GREENFIELD I didn’t mean to imply the school was the 
only agency, but — 

MR SPEARS Aren’t we getting back to the point of the danger 
listed in here, in your sheet you sent out to us, the danger you get 
into the areas that are easier to operate in and that we won t get 
the total and won’t get the measurement of the total objective ol 
the school? .. 

CHAIRMAN TYLER Well, that is a danger, but my own talk was 
to be sure you seriously meant that this is one of the objectives you 
work on, and how do you work with your teachers on it and what 
do you do about it I am sure how you work some of the things 



32 


A STUD Y OF THE NA TIONAL ASSESSMENT 


you do in reading, arithmetic, and science, but what do you do in 
connection with the development of moral value if the community 
has quite different moral values 7 

MR SPEARS You will come out with it in your input as a part of 
a standard of what the school, after all, should be putting in, but 
you don’t tell what it is and let the schools see themselves a little 
better, and you go as far as you can, giving us instruments to help 
That’s all you do 

I don’t think there is going to be anything totally objective 
about the whole operation You just can’t leave those areas out, 
because they are difficult We are struggling with it, all the 
time — -attitudes of children toward other children, their feelings 
toward other human beings 

CHAIRMAN TYLER. I understand that, but am also conscious of 
the time when I was Dean of Chicago, of the quite proper ques- 
tion raised by the schools, “Why should they be blamed for the 
delinquency of the community, where the whole community set- 
up was teaching it 7 ” The school wasn’t teaching it So, can you 
expect them to teach morality in a delinquent community 7 
MR SPEARS We had six hundred teachers in a course last se- 
mester, with an office and an assistant superintendent in charge of 
it, and we’ve got all compensatory education put under his office, 
as curriculum He is in the area of curriculum, whether he goes on 
the high school assembly stages for panels with superintendents or 
somebody else We are in the field nght now and, unless a move- 
ment such as this is coming at the national level and will appreci- 
ate these things that we are in by making an effort, we are just 
going to be out there floundering by ourselves, as individual 
school systems 

CHAIRMAN TYLER If this group agrees this is an important 
part of it, there are certainly indices of behavior for a community, 
for a sample of children, youth, and adults, that can be applied, 
that can be improved as time goes on But, certainly, we know 
better how to do it and assess it than we apparently now know 
how to teach it 9 

Tyler continued in the course of this meeting to articulate the principle 
that the schools should not be held accountable for changing the moral ch 
mates of communities or for transmitting values that were not transmitted 
through the total life of a community But many participants firmly main- 
tained that their school systems were now involved in teaching civic and 
moral values and were being held answerable for achievement in this 
arena 


9 Proceedings of ECAPE Conference with Superintendents and Administrators, Sept 
22, 1964. pp 63-68 



DIVIDING KNOWLEDGE INTO SUBJECT AREAS 


33 


In taking his position about the limited role of schools in affecting values, 
Tyler introduced another point which he mentioned only in passing the 
relationship of the community’s aspirations for its children and its socioeco 
nomic makeup to the achievement of even the tangible items, such as reading 
and mathematics Although the Coleman Report had not then been pub 
hshed, similar conclusions about the importance of socioeconomic back 
ground pervaded the meeting if only as hunches But these hunches took a 
strange direction While many part.cipants mentioned the importance of t e 
family’s and community’s influence on the schools outcomes, what they 
stressed was the school's responsibility for changing the community s values 
by educating children into ’’better” social values This was "« mere J 
indication of the hubris of these particular educators but a ref ectior of th 
fact that, at leas. a. this point in history, many educators saw themselves 
educators of the whole child, m to emmumto, as .1 were 

Tyler did a. leas, give .he appearance of seeming response i — * 

some of the more difficult intangibles, if enough educators agreed they 
important 

FATHER CURTIN I’d like to raise > blll j‘ ff^enmg 

w, th the committee’s own .alt, ™ de '°X y and rehab.lny For ,n 
at certain outcomes with som > certain values to an 

stance, you mentioned attitu es ? tQ be mea sured— social 

educational system which are n P There are cer tain other 
values, which involve a racial ^ eative thinking critical 

outcomes that are highly '"«"S re admg not just reading 

thinking, such things as an tJiat matur es as the person 

achievement but an interest in reading 

grows older, through the grades of the commit 

These and other intangib ■" * ’ “ J to warrant the mea 
tee, can these be measured sufficiently 

sunng instrument we are discussing planned to do is to 

CHAIRMAN TYLER Whatourcomm’t P lhinl[ ought 

what they can come u F 

US jgow, for example, kFsJak^soineoHho« that y°”^ ^ a 

Some of those are moreeasdy^ „ /ind out whe.he 
sion of privacy tnan 



34 


A STUD Y OF THE NA TJONAL ASSESSMENT 


youngsters are interested in reading and whether they are actual 
ly doing reading beyond that which is actually required for the 
school It’s much easier to find out without invasion of privacy 
than the question of “How do you feel about persons of other 
races or of other religions or other groups 7 ” things of that sort So 
I am sure that the questions you raise are different in the degree 
to which the kind of assessment is easy or difficult 

But, if you say, “We think you should include these things,” 
then our next step is to try to find out from what then we would 
call pilot contracts the possibility of getting the devices, and I 
mentioned test people but we also planned to explore with persons 
like the Survey Research Center at Michigan and the National 
Opinion Research Center in Chicago and the Bureau of Applied 
Social Research at Columbia and the Survey Research Center at 
California, kinds of devices that could be used You’d not 
take lightly the business but would try to see how far they could 
be measured before making any recommendation to any ongoing 
project after this exploratory committee is over 
MR JOHNSON I feel very strongly that such an index should be 
developed and should explore new areas, insofar as possible — 
perhaps the chief among them curiosity and an inquiring mind I 
think there is no greater prerequisite to success in any field than 
an inquiring mind so as to improve procedures 
MRS GREENFIELD In that particular context, if there were a 
way of finding out what influence a school had on a particular 
inquiring mind — in other words, is that mind encouraged by his 
school experience or does it forget about it and stop asking ques 
tions, because of the kind of teaching he has, because of the ch 
mate of the school These are the kinds of elements we think are 
just as critical, as I said before ,0 

THE “INTANGIBLES” BECOME CITIZENSHIP 


While many of the complex questions raised by these educators might have 
mired the Assessment in irresolvable difficulties, at least two of their concerns 
might have led to clarification of what the Assessment was trying to accom 
plish The first was their point that in evaluating important personal qual- 
ities, such as honesty and respect for others, the Assessment would inevitably 
be involved in measuring the values of the general social community Taken 
to its logical conclusion, this can be extended to show the inevitable interrela- 
tionships between certain “intangible” qualities, such as intellectual curiosity 
and attitude toward school, and more tangible outcomes of achievement in 


10 Ibid , pp 7&-79 



DIVIDING KNOWLEDGE INTO SUBJECT AREAS 


35 


school subjects In other words, even if the Assessment measured only 
achievement in school subjects, it would by definition be investigating com- 
munity values, because of the interconnection between socioeconomic factors, 
values, and educational achievement Secondly, the educators took seriously 
the Assessment’s intention to measure attributes of the adult population and 
therefore pointed out that many of the qualities considered to be crucial for 
success as an adult were not necessarily related to school subjects 


These issues do not seem to have been raised seriously in subsequent meet- 
ings of ECAPE members, the momentum was great to go directly about the 
business of measuring Since contractor were to be hired to produce t e 
instruments, the terms of assessment had to be manageable, since the schools 
were seen as both the means and the constituency for the Assessment, eir 
terms of reference were predominant Eventually, instead of rea'ly dealing 
with the possible interrelationships among the intangib es an coa ^ 

subject areas, the committee created the nonsubject area ° 01 

an umbrella under which some, though not all, of the intangibles could 


included crAPF the fields discussed in the 

Tyler summed up for the members of EGAPt 

various discussions with schoolpeople 

CHAIRMAN TYLER Well, let’s consider, /'Srences'Th'iy'have 
have been discussed m these two pr '"‘ r °“ ra i| yi mathematics, 
been reading, the language ar < >£*, am) , he hke, health 
science, the several social st as Da rt of science, vocational 

and whether or not it is treated with, P* arts UI)dcr that, 

education, the fine arts— music and g P studies, 

and either treated alone or as viewed for law and 

citizenship, which refers to sue 0 f thing, and 

order, concern for the rights of others and that^yp ^ ^ ^ 
then for a moment considering any o • ^ understa „d,ng, 

as involving one or more type , tandjn g m problem solving 
ability to use this knowledge or jn g problems, interest in 

or thinking through, analy *‘" S ( ; |d _5 m ma ny cases, attitudes to 

^^Sbn^mediin g,ha,,sun £; 

would be applied whem releva^ ^ j|x or sevcn yo u think 
look at these nine he' 


Jt/J? JOSEPHS I have eigh, 



36 


A STUDY OF THE NATIONAL ASSESSMENT 


MR WITHEY Reading, mathematics, science 
CHAIRMAN TYLER. If you put “language arts” as reading, un- 
der it, then you have eight Actually, in the elementary school, 
somewhat I think unwisely, perhaps, but because reading is con- 
sidered so important a problem, they tend to be separated, and 
language arts, involving writing, speaking, and so forth, is usually 
treated separately 

MRS SMYTHE Where do you put the intangibles that came up, 
such as the moral values, honesty, the self-image the child has, 
regard for scholarship* 

CHAIRMAN TYLER I’d put them under the heading of “citi- 
zenship,” broadly defined, but that’s just a catchall at the mo- 
ment Perhaps the best way to start out is, which of these would 
you leave out-’ 

MR BARNES I suspect the contractors would have some rather 
strong notions about which were preferable to include, wouldn’t 
they* 

CHAIRMAN TYLER. They would find “citizenship” most diffi- 
cult to deal with 

MR JOHNSTON The whole area of social studies' 

MRS SMYTHE This is why I felt so strongly, this is the kind of 
decision we can’t contract out, that we’ve got to make, even if it 
takes extra staff to do it, to do the necessary work !! 

The discussion continued until it was agreed that all the major fields of the 
school curriculum were to be assessed 12 Steve Withey of the Survey Research 
Center, University of Michigan, was commissioned to prepare a document 
for a later meeting, which would list all these areas and the controversial 
aspects of the objectives in each 


WHO SHOULD CONTROL DECISIONS ON WHAT TO 
MEASURE* 

The difficulties of choosing among certain conflicting points of view within 
each subject area were argued at length before adjourning in September, and 
Mabel Smythe, principal, New Lincoln School, New York City, continued to 
urge that ECAPE itself take on the responsibility of making most decisions 
rather than leaving them to contractors Yet, by the time that Withey’s sum- 


11 Prtxtedtngt of ECAPE, Sept 23-24, 1964, pp 323-327 
12 Ibid, p 331 




DIVIDING KNOWLEDGE INTO SUBJECT AREAS 


37 


mary was presented at a December 6, 1964, meeting, there seemed to be little 
desire by ECAPE members to become directly involved in selecting subjects 
and defining objectives. 

Tyler spoke about the necessity of including actual teachers in the formula- 
tion process and now emphasized the fact that there were common objectives 
within each subject on which it would be possible to reach eventual agree- 
ment. The problems involved in testing the intangibles were taken up in some 
detail, and again Tyler stressed only assessing understandings and attitudes 
which could then be measured But Mrs Smythe continued to object to 
giving control to the contractors, who might "be tempted to choose t ose 
areas where the instruments are easier to come by, simply because t ejo 
so big.** ” Tyler continued to insist that participants at future conferences- 
members of the teaching profession, curriculum specialists, and P otentia1 ^; 
tractors— would resolve whatever conflicts existed If the contracting 
zations “see this as something that is desirable, that t ey ave a ^ 

in; that it can be a real contr, ballon to then own deve »pn>“t ^ “ J d bc 

be anxious to involve their ablest people an not to e d 

done, as somebody suggested, by second-rate people and only v.ewed as 

minor part of their main job ’’ „ mie ht be 

Mrs Smythe persisted in pressing her point that t « ‘ 

making enieial definitional dec, sums on the nature and content 
areas — decisions which ECAPE itself should control 

MRS SMYTHE I am a little cortfus^ We arev—vvc arc ^ Jf 

people to develop instruments hr - 

what our objectives are ““p “readme interests” a most 

CHAIRMAN TYLER Well, tf you cal "“ tog, but that’s the 

kn/eTat which^dunkwe'tTbe ready to pmeeedalter^the^senui^ar 

1 

am really saying, 1 addition to having broken 

materials which we have, 



38 


A STUDY OF THE NA TIONAL ASSESSMENT 


we had a summary of the literature, the literature is not unani- 
mous in agreement as to what education is attempting to do 15 

Tyler tried to reassure Mrs Smythe that the elaborate process he envisioned 
would involve himself and other ECAPE members from its inception to its 
end But she persevered 

MRS SMYTHE What I was groping toward was the basis on 
which we would evaluate the instruments after they’d been devel 
oped We would want to look at these things brought to us next 
July and say, “Now this one better meets our objectives than that 
one,” and, in order to do this, we would have to have some rea 
sonably dear consensus of what objectives they were expected to 
meet Perhaps I am asking for more direction for the test 
developers than is necessary, but I was thinking, if we hand it to 
someone, this document, and say, “In general, these represent a 
consensus, they’re all interesting and reasonable approaches, see 
what you can do to fulfill them all ” 

CHAIRMAN TYLER. They wouldn’t be able to find their way 
around, that way 

MRS SMYTHE Yes, it would be too much for them 
CHAIRMAN TYLER. I think of one that summarizes down to 
outlining the content areas and the actual habit and participation 
Ithat] evidences they are really doing these things outside the 
school class room and then working with teachers would spell 
these out still further Then it would come back to us for further 
criticism 

MRS SMYTHE I think if they had you at their elbow, they’d be 
in no difficulty I just wanted to be sure when they get away from 
you, they will have something to refer to 

CHAIRMAN TYLER. Well, they shall, that’s the kind of thing we 
will have to prepare for such a seminar It’s just that I am getting 
a little disturbed about how much work there is to this thing 1 
( Laughter ) 

MR CORSON Aren’t we saying, really, that in each of the areas 
that you have defined, there are no clear, readily agreed upon 
objectives at the moment 5 

CHAIRSfAN TYLER All written out* No, that’s true 
MR CORSON All written out, and of which you might say to the 
contractors, “This is what we want you to measure,’ and, lacking 
that, you have to say to the contractor, “We have canvassed what 
objectives there are to be found, we have put them down, here, 
we are looking to you to find devices to measure ’ 


11 Ibid, pp 168-170 




DIVIDING KNOWLEDGE INTO SUBJECT AREAS 


39 


CHAIRMAN TYLER We are holding a seminar to help clarify 
and get the essence on citizenship and recording it as part of our 
genera! understanding Then we get agreement that Mr X takes 
responsibility for this part of what we have clarified, and then he 
begins to pursue this farther and work it out in more detail in 
cooperation with school people 

MR CORSON Might we not come up with, at the end of the 
seminar, with saying that, in the field of citizenship there are a 
dozen different objectives that are obviously held by some in that 
particular field We find ways of measuring eight out of the doz 
en We don’t find ways of measuring the other four and, in work 
mg with the schools and measuring those eight, you may develop 
ways to get at the other four then, or you may not 
CHAIRMAN TYLER That is, it’s quite possible 
MRS SMYTHE I think we have to recognize, too, there are 
sets of objectives One .s our objectives m mataS this ^clms 
ment, in doing this survey altogether and I think 
are much more easrly arrtved at then than the objective dtoa 
lean education as a whole I don I see that we precise 

what are the educational objectives of all Aro "'“ in J 
summary form, because one of .he things we Un do is ,o get 
a sense of the range and vartettes in emphasis from P • 
country to another But our own 0 X ‘‘ha. 

stared, I think, <o ftnd ou, wha. ts .he rang o PP™ m v ' a „ ous 
is the range of information, what is the level o g ( ^ not 

areas which are defined, already, for u , ^ ul , t glv es 

saying this is a definitive statement of our objectives 

some idea 16 

These lengthy extracts show clearly the ^^^^he^cmTponra.t of the 
was a, this pom. willing to discuss inlernaUy Here, 

subject matter objectives as they were fm y pr explored and the 

the differences of op.n.on within current cmnod* £ ^chaplet shows, 

related to the school’s definition of them 


16 Ibid , PP 172-1 76 



40 


A STUD Y OF THE NA TIONAL ASSESSMENT 


CHAIRMAN TYLER. We are talking of the commonest classifica- 
tion of the schools in the curricular area — the language arts for 
the elementary schools, which is usually broken up in the high 
school into English and foreign languages, but commonly called 
the language arts at the elementary level, and mathematics, natu 
ral science, social studies, and you’ve got health and physical edu- 
cation Then you get into the question as to whether you treat the 
various occupational objectives separately, but, generally , I would 
see vocational assessment of the kind of assessment we are talking 
about here, as of two sorts — one, the understanding of the young 
sters about the world of work, what kinds of jobs are available, 
and something of the sort of education required for the different 
sorts of jobs This is knowing their way around the world of work 
Second, having some occupational skills, if they are not planning 
to go to college to develop them 

Then, finally, we have the fine arts area, which may or may not 
be separated into music and the visual arts 

I believe this represents — is this nght, Paul — this represents the 
usual area of the public school 
MR JOHNSTON Yes 

MRS SMYTHE And under “social studies,” you include the 
catchall areas — citizenship, I presume 7 

CHAIRMAN TYLER. That’s where it’s most likely, the people 
teaching social studies and the supervisors are more likely to be 
concerned there, although there are some objectives that would be 
schoolwide — the concern with the habits of study and study skills, 
and so forth, is schoolwide as well as being contributed to by 
many other if not all the areas 17 


Ul'.'.mateJ-v , as, f .b.e vxrx chapter dcmctette&tes,, tSvt cco.teacw.rs. came to as 
sume a major role in the dehneaung the subject areas The original ECAPE 
committee became an advisory group, represented mainly by Tyler and Tu 
key, and a permanent staff was developed to guide and mediate among the 
contractors Perhaps this was inevitable, even fortunate, if the Assessment was 
to get off the ground Certainly no consensus would actually have been possi 
ble among all the groups and individuals consulted by ECAPE, although 
Mrs Smythe’s suggestion of a pluralism of subject matter objeemes might 
hare been % table 

The educators who believed that the intangibles were central to an evalua 
tion of education raised significant ideas which never found their way into 
the Assessment But at other meetings more conservative administrators 


•Mbid.pp 136*137 



DIVIDING KNOWLEDGE INTO SUBJECT AREAS 


41 


pushed in opposite directions — toward assessing solely the most basic subjects 
and m the exact terms most commonly held by the schools It is evident from 
the transcript that Tyler leaned toward this latter position and so influenced 
the Assessment There were many reasons lor this not the least of which were 
the inherent difficulties in communicating to the general public if other ap 
proaches had been taken, and the substantial political clout of the established 
administrators and curriculum specialists But a more fundamental reason 
must have been Tylers own commitment from the beginning to a simple 
census of educational outputs which would once it was accomplished stum, 
late research into relationships among affective factors which were difficult 
enough even to understand much less to measure In Krma of th, 
Assessment s acceptability to Congress and the states as a tool for a counta 
Mi*, “Formal schooling is what you pay for as John Tukey said 
interview at NAEP headquarters in Denver in August 


WHO WOULD DEFINE EACH SUBJECT AREA' 

■ . fh< . ECAPE meetings as to who would 

The lack of clarity evident i vsewd 

determine the detailed nature of t e su jec \ atonal 

from the firs, December .963 ^^Zh which du-vw- ^ 

Project Tyler at that time p ™ P ° S ^. c| ,^ n , |, UM „ ,m v h, occur 

tween educational specialists and 

^ tvleb 

attempt to approve disapprove Sg anja lhc advisory panel 
forth and my view *J , J opte _ OM the informed scholars 
would represent Iwo kinds of peop abou , things that are 

and teachers in ' h ' b y „, , he other would be a certain 

ahead that they think unporuv |ik( . , hls group as being 

number of people who T““ olt J n J„on of what s important ,n 
persons with some wi«fom ask ^ specalul at any point 

adult life and would be _ab wha( s Its rca l relevance to llv 

Why would you include = ^ s]gnl(|Cant hie' and at ihe 

ing a productive and sat y s ^ wo uldn t you do somcllung 
same tunc » a „ d so forth so there d be a continuing 

about the Bill of Kigms , ^ responslbIe laymen ara | people 
d.alogueb.nveen in the area to try to see there d be some 
who have specialized 

balance unrealistic and I don t know what the Com 

This may be qu 



42 


A STUD Y OF THE NA TIONAL ASSESSMENT 


missiotier would have in mind as to how to get this done, but it 
seems to me that’s one way of trying to see that what finally 
survives is judged important by both groups — the people working 
intensively with it as teachers or scholars and the people who are 
concerned about the product of American education and the 
quality of American living, and so forth 18 


The next chapter illustrates how the nature of such dialogue, as Tyler 
conceived of it, and the principle of including in the Assessment only what 
survived such dialogue had severely limited utility if anyone’s real interests 
and values were to be represented We use the term “Cardinal Principle” to 
highlight the adamance with which Tyler retained his attitude that “what 
the schools are trying to do” could meaningfully be reflected through the 
process he proposed Serious doubts are raised as to the ultimate wisdom of 
Tyler’s stance on this important matter 


1 Proceedings of the A ’aXtonal Testing Project Conference, pp 133-134 




4 


THE SUBJECT-MATTER 
OBJECTIVES 


Once ECAPE deeded to evaluate performance m ten 
distinct subject areas, began to def.ne specific 

the basis of which exercises could be written and P' r g NAE P’s 

chapter analyzes the process and under ^ourt. the guidelines formulated 
development of subject-matter objective g h guidelines 

by ECAPE and later NAEP appear ,o be clear and simple, 
and the process that ensued from them^actua^^d ^ available 

tory premises and procedures h ^ dear that the subject- 

working papers, memos, and rep , cycle reflect these unper 

matter objectives as established for the Assessment s cycl 

ceived and/or unresolved confusions o purpos wone than 

This is no. to say that NAEFs subject , hcy are not a, leas, 

such objectives are generally r a . noto nously difficult to pro- 
mote comprehensive ihan useful for educa.ors who 

duce subjec.-mal.er » b ^7 p ^ n a .nvolveme„, in lhat process P- 

■ Major sou ;2 s 

*** *■ 

Second Report, by * . 

pp 378-380, April 1967 43 



44 


A STUDY OF THE NATIONAL ASSESSMENT 


precede the generality and abstraction that characterize most lists of objec- 
tives It is this abstractness which often makes such lists meaningless to those 
who might be expected to want to use them Another somewhat ironic pecu- 
liarity of subject-matter objectives is that no matter how vague and mean- 
ingless they are, their adoption by a recognized educational body lends them 
an aura of authority They exist, therefore they must be taken seriously 
The Plowden Report may have provided the first official debunking of lists 
of objectives since their appearance as staples in the arsenal of planning and 
evaluation 2 

general statements of aims, even by those engaged in teach- 
ing, tend to be little more than benevolent aspirations which may 
provide a rough guide to the general climate of a school, but 
which may have a rather tenuous relationship to the educational 
practices that go on there It was interesting that some of the head 
teachers who were considered by H M Inspectors to be most suc- 
cessful m practice were least able to formulate their aims clearly 
and convincingly [Paragraph 497] 

It is difficult to reach agreement on the aims of primary educa- 
tion [unless] anything but the broadest terms are used, but formu- 
lations of that kind are little more than platitudes [Paragraph 
501] 

The Assessment, from the beginning, was conscious of the hazards intrinsic 
to any stated set of objectives for the nation’s classrooms To avoid mean- 
inglessness, it insisted that its objectives would be behavioral, that they would 
eschew indefinite phrases and directives, and that they would be illustrated 
by exercises so that an additional level of specificity would always be pro- 
vided Further, the objectives would include attitudes and “understandings’' 
as well as the usual content which each subject area hoped to transmit to 
students 

Too, the Assessment was keenly aware that its very existence relied upon 
putting to rest fears that a centralized curriculum setting effort might result 
from the subject-matter objectives EGAPE’s and NAEP’s pronouncements 


2 Lady Bndger Plowden et al , Children and Their Primary Schools. A Report of 
the Central Advisory Council for Education (London HMSO, 1967) More recently, Samu- 
el Mcisels has shown how the movement against explicit goals has spread through the 
alternate e schools in this country, and he has examined the costs and benefits to those 
who take such a stance, in “Goal Stating in Open Education,” unpublished special 
qualifying paper, Harvard University Graduate School of Education, Cambridge, 
Mass, 1971 




THE SUBJECT-MATTER OBJECTIVES 


45 


consistently denied that it intended to threaten anyone or anyone’s goals To 
this end, it insisted that the process by which its objectives were to be de 
termined would take into account the viewpoints of everyone concerned The 
assumption was that because the Assessment was setting out to gather existing 
objectives rather than to create new ones or even new variations on old ones, 
it could even incorporate the major objectives of all concerned Thus, the objec- 
tives were to be meaningful and acceptable to all — a condition which seems 
a contradiction in terms 


THE PROCESS FOR FORMULATING THE 
OBJECTIVES 


ECAPE planned that the formulation of the objectives wou'd include the 
input of lay citizens as well as of scholars and educators in eac 
involvement of lay persons was seen as a bold move, one g oin S threc 

ventional procedures involved m testing The participation d f 

groups was viewed as desirable for what seem to be both 
L Lons. By making certain .ha, d. object were <W , 

by both the producers and consumers of . 

assessment might result, by including representatives ro b their 

ties (students at that pom. apparently being 
teachers and parents),’ an assessment could forestall 
ter 

For National Assessment, goals mutt Impor- 

portant groups of people Firs , y subiect area Scientists, 
tant by scholars in the discip me o g ncc objectives are 

for example, should generally agree that the tQ J most c du- 

worthwhile Second » b J“ UV “* ““chmg goals in most schools 
cators and be considere esi National Assessment objec 

Finally, and perhaps most ™ q y. , , Jay citizens Par- 

tives must be considereddes^blebytt ^ ^ „ ob- 

ents and others i mwes d ^ country lnow a „d that it . 
jective is important m* y 

of value in modern Wfc lden tification of objectives should 

help^to — Ur ' 

taring objectives in all subjects 



46 


A STUD Y OF THE NA T10NAL ASSESSMENT 


rent tests in which some item is attacked by the scholar as repre- 
senting shoddy scholarship, or criticized by school people as some- 
thing not in the curriculum, or challenged by laymen as being 
unimportant or technical trivia 4 

This defensive stance and the strategic considerations motivating it must 
have been partly responsible for the consensus orientation of EC APE 
throughout the formulation of the objectives Rather than using the three 
chosen constituencies as advocates for objectives arising from the interests and 
expertise of each group, the process actually ensured that diversity and con- 
flict — and perhaps searching inquiry — among and within the groups would 
be kept to a minimum 5 

Several components of the process contributed to this end the crucial role 
given to the contractors, the composition of members in each of the three 
groups, the brief time allocated for the groups to consider the objectives, and 
the dilution of the possible input by lay members by giving them an extreme- 
ly limited role 

The prominence of the contractors’ activities was evident from the outset 
In most cases, the contractors formulated a list of already familiar objectives 
which was used as a guide for the two day meetings of the scholars and 
educators in each subject area, generally held during the summer of 1965 It 
is difficult to know whether the contractors’ working function during the 
meetings was that of attentive listener or mediator, gadfly or advocate What 
is clear is that the agenda and sometimes the initial working papers, the 
minutes and reports, the ultimate decision, and probably the most lengthy 
deliberations were all provided by the members of each contractor’s team 


4 Cf NAEP Music Objectives (Denver ECS, 1970), p 2 Almost identical 
statements appear in the introductions to each booklet containing the objectives for 
the subject matter areas 

5 This is not to imply that only through conflict among constituents could 
meaningful objectives be formulated, but to emphasize that, given the opportunity, 
the three groups consulted by ECAPE might have been able to present meaningful 
information about different areas drawn from their own experience For example, 
John F Kerr has discussed the “three mam sources of data** from which objectives 
may be demed first, “information about the level of development of the pupils, their 
needs and interests”, second “the social conditions and problems which the children 
are likely to encounter”, and third, “the nature of the subject matter and type* °f 
learning which can arise from a study of the subject matter ” Because ECAPE needed 
approval more than inquiry from its constituents, these sources of data were never 
seriously tapped — and the resulting objectives exhibit this lack See Kerr, “The Prob- 
lem of Curriculum Reform,” in John F Kerr (ed ), Changing the Curriculum (London 
Utmerut) of London Press, 1968), pp 13-38 



THE SUBJECT-MATTER OBJECTIVES 


47 


Thus, the scholars and educators began their work from within a prepared 
context — one which carried the authority of actual practice Besides, these 
tuo constituencies had been combined into a single group at the outset 
Whatever forcefulness might have arisen from their different perspectives 
must have been blunted — although swifter agreement may have been gained 
Personal involvement in and of the group, in any event, seems to have been 
limited The length of time that each committee spent together as a delibera 
tive body was as a rule only the two day period and subsequent input evi 
dently was requested and received individually From the register of the 
schoolpeople’s affiliations, it seems debatable as to whether any actual practi 
tioners were included at all Instead, the “practitioners were persons w o 
though they may have begun their careers actually working with children m 
the schools — were then representatives of representative organization 
NEA, etc) This is not to imply that such delegates could not can in “ 
substantially to resolution of the problems at hand, on y 1 ai 1 eir e 
and insight tnevitably came from a perspective that was distant from 
classrooms 


INVOLVEMENT OF LAY PEOPLE 


The contractors submitted the preliminary ob J^ 1 Thc University 

then made arrangements for the participation o nomina tions of women 
of Minnesota was designated to take charge o see mg ^ ^ Congress G f 
and men from such organizations as t e > roncress and of plan 

Parents and Teachers and the N-nna, Chamber 

mng conferences in four regions o t ' suburban areas and rural 

a number of “lay panels,’ representing large 

areas . h ECAPE, met to consider all 

Each panel, chaired by someone *PP° ^ Then a ,I ,he chairmen 

the preliminary ^“ k City m December 1965 to make their 

gathered for two days in 

reports to ECAPE designed from the start to be reacme rather 

Input from lay people was des *? d „ lth responding to lists of 

than initiating The paruc.pant ha excises, which had already 
objectives (and sub ° bjeC “" cto „ aid the other educational ""I*™ . „ 

been drawn up by the c cd , hrough their chairman, their P 

members ol one lay P*™ ? on m short notice and hawng a 

sure a, having been called J ^ ^ , hcy mct „ can be gu 

time in which to 



48 


A STUDY OF THE NA T10NAL ASSESSMENT 


that input from lay people was further attenuated by the fact that the selec- 
tion of panel chairmen was, as was the case with schoolpeople, top heavy 
Eight of the eleven chairmen were either members or executive officers of 
local or state school boards, or leaders of statewide educational associations 
Although these conditions made unlikely any substantial contribution from 
the grass roots or even from less organized but interested lay people, it is 
obvious, from the summary of the New York conference of panel chairmen, 
that serious concerns and objections had been expressed at the regional meet 
mgs The chairmen themselves seem to have been conscientious in conveying 
their members’ opinions although in the ensuing discussion with ECAPE, 
represented primarily by Ralph Tyler, who chaired the meeting, they seem to 
have been qvv\Ye willing to come to conclusions other than those they hrought 
with them from the lay people for whom they supposedly spoke 

There was no possibility at the conference for the many disparate views of 
the lay people to be brought together in any coherent statement or perspec 
tive, each chairman presented, with greetings and diplomatic thanks for the 
opportunity to participate, brief sequences of his or her members’ criticisms 
and comments Matters of small and larger import were presented together, 
and until discussion of specific objectives began, it is likely that the lay chair 
men themselves did not know whether their panels had found points of agree 
ment 

As the conference progressed, it was obvious that at least one vital issue had 
been addressed by all the lay panels There was deep concern that the Assess 
ment would turn out to be a mask for federal control or at least interference 
in educational matters, though some panel chairmen had evidently been able 
to allay the fears of their groups Tyler’s response to such misgivings is a 
classic of historical irony 

I understand that some people see a specter of the Office of Edu 
cation or somebody else behind this project, but I don’t see why 
they talk about that in this relationship because we are starting 
here You aren’t from the Office of Education, we aren’t from the 
Office of Education YVe are starting with things that the schools 
are trying to do and figuring out how to get some information 6 

Apart from this general apprehension, the questions raised in New York 
ranged o\er a wide spectrum — ’from the fit between the prototype exercises 
and the ages for which they were designed, to the exclusive emphasis on 


‘Summary o f Conference of Lay Panels p 4 



THE SUBJECT-MATTER OBJECTIVES 


49 


Western classical culture in the literature and music objectives When read 
now, some of the changes that the lay people sought seem far more ad\anced 
in educational theory and practice than the assumptions on which theobjec 
tives had been based The interaction between the separate subject matter 
fields had been emphasized by a panel member from Pennsylvania, the need 
for assessing five-) ear-olds in order to have a measure of achievement e o 
youngsters entered school was communicated from California, and t e nee 
for a more differentiated adult sample, with thirty to thirty nine year o 
a separate category, was suggested by the Georgia panel Severn £ r0U P S 
insistent that the Assessment take seriously its commitment to eva u 
attitudes and values and proffered various opinions as to w at see 
omtssions or unwarranted emphase^such as the lack of 
opment of esthetic and moral values, the disregard o wor an 
tn the Assessment, the uncrtttcal acceptance o, the value of the - 

tists, as well as the emphasis on European ctvt izatton as j ay 

Amertean culture Whtle there were some points of 

people's input was by no means unified— rather t ar , ct f cs oI «pe 

matte and personal and professional attitu cs crivtn nve writing” 

rtencc Foresample, a stmultaneous^all 
and “writing mechanics was sounded, 
majority of the panels 


ECAPE'S RESPONSE- THE "CARDINAL PR.NCIPUf" 

The forma, of the New /“rk inference 
chairmen’s presentations and preven their accc ptance But another 

to voice their opinions strongly or to o werc blocked seems clear 

more subtle means through which , e,r pt the meeting by Tyler 

The tnstruettons gtven to the P^^^tra of them sphere ol 
narrowed and cons.derabjy confu^d dtt = n ‘ ^ Mure , h e 
influence Tyler ettp.amed that whtl ^e Jay P or . £ 

. ,u PV found inappropriate, 7 „-ords of tne 

of objectives which they complex matter In th 

prel.mtnaryobject.ves would 

1 mmmms?- 



50 


A STUDY OF THE NA TIONAL ASSESSMENT 


do " It isn’t fair to assess people on what they have learned if they 
haven’t had a chance to learn it For example, many of the 
chairmen expressed concern over the use of the limitation of cul- 
tural considerations to “Western Culture” in the art, music and 
literature sections They raised the question whether we were con- 
fining ourselves to our own narrow little culture and implying 
that this is the only thing worth learning or knowing Chairman 
Tyler said that he would take this idea back to the contractors, 
but that if the schools are not teaching music and literature of 
other cultures now we couldn’t assess for such knowledge We want 
to assess only what the schools are attempting to accomplish [Emphasis 
added ] 7 

That the objectives were to be defined in terms of “what the schools are 
trying to do” is a theme reiterated so often by Tyler in response to the lay 
people’s criticisms of the objectives that it can be called the “Cardinal Pnnci 
pie ” 8 

At first it looks like a simple principle, but as it was actually used in the 
determination of the objectives its meaning seems increasingly ambiguous 
While it usually implied that the determining factor m the Assessment’s ob 
jectives was the actual goals of the schools, it sometimes appears to have 
meant the actual practice of the schools At still other times, ECAPE seems to 
have used the Principle to mean what it thought the goals of the schools should 
be, here the determination of certain objectives was almost wholly normative, 
reflecting the opinions of the early Assessment leaders or the experts from the 
contracting companies 

Which meaning of the Cardinal Principle was applied had great impor- 
tance, of course, for the politics of the Assessment and the content of the 
objectives In the working relationship between ECAPE and the lay people, 
however, different usages must have had the same effect lay views would 
have to be weighed against the decisions of the contractors who were advised 
by the experts as to the actual practice or the aspirations of the schools The 
Cardinal Principle was used continuously and also inconsistently until it be 
came a kind of mystification through which ECAPE could defend itself 
against criticism but by which it ultimately seems to hare confused itself 
Certainly it made it difficult for ECAPE to hear what the lay panels were 
saying or to de\elop logistics which might have produced a real collaboration 


Summary of Conference of Lay Panels p 1 

This term, coined by the authors is obviously in no way related to the 
Cardinal Pnnciples of 1918 written by the National Education Association 



THE SUBJECT MA TTER OBJECTIVES 5 j 

between the laymen and those who made the final decisions on the objectives 

AN ILLUSTRATION- WRITING, SOCIAL STUDIES, 
MUSIC, AND THE CARDINAL PRINCIPLE 

Writing ECAPE’s response to the lay panels’ strong determination to in 
cue writing mechanics in the objectives illustrates the way in which com 
munication among the constituencies was muddled and the choice of objec 
tives complicated by inconsistent application of the Cardinal Principle 
ccordin g to the available documents, the decisive pronouncement against 
* e inclusion of mechanics in the writing objectives had first been given by a 
member of the ETS contracting team 

The [WntingJ panel members also discussed whether grammar 
and vocabulary should be stressed The panel recognized the exis 
tencc of conflicting approaches to grammar and sentence struc 
ture An ETS staff member distributed and explained a grammar 
test which indicated that the teaching of grammar had little or no 
effect on a student’s written expression In light of this mforma 
lion and other discussions which the panel members held the 
committee agreed that the survey should not assess a knowledge of 
grammatical terms 9 

It is possible of course that the Writing Panel members might well have 
come to the same conclusion without the intercession of the contractor— but 
not, and this is the central point, if they had been following the Cardinal 
Principle, that the Assessment should be concerned only with what the 
schools were actually attempting For though omens of changes in teaching 
“English” and writing were present in the work of linguists and experts in the 
language arts, grammar was still a major thrust of the curriculum in English 
classrooms throughout the country 

Even while word of the linguistic theories being developed by Noam 
Chomsky and his colleagues at MIT and by Roger Brown and his group at 
Harvard was filtering down from the universities to institutions of teacher 
training, the linguists themselves were vehemently denying that their field 
was in a state advanced enough to bear application to the classroom And 
while the Dartmouth Conference produced yet another theoretical blow to 
the classroom teaching of mechanics m English, traditional practice » as not 
widely affected At Dartmouth College in 1966 the same summer ,n which 


9 NAEP Wnting Objectors (Denver ECS 1970} p 5 



52 A STUDY OF THE NATIONAL ASSESSMENT 

the Writing objectives were being developed, American and British language 
arts specialists criticized traditional structural analysis of English and explicit 
teaching of grammar as impediments to the knowledge and use of the lan 
guage But new curricula had not yet been written, much less found their way 
into classrooms 

The original Writing objectives reflect the orientation of the Dartmouth 
Conference and are clearly committed to actual rather than analyzed lan 
guage use in the classroom But then, as now, the goals and methods used in 
English classes stressed identifying a preposition and diagramming it in a 
written sentence more than using it in daily speech These objectives exhibit 
the Cardinal Principle used as a normative standard of what the schools 
should be attempting, not what they were actually trying to do 

The response of the lay people could have been anticipated Many panel 
members were adamant that the objectives should not downgrade the me 
chamcs of writing, in particular, they complained about the wording of “Ob- 
jective I Wnte to communicate adequately in a social situation” 

It was suggested that in the amplifying statement under this ob 
t ^ lc action, “correct spelling, grammar, mechanics, and 
the like are not essential to the effectiveness of social communtca 
tion, indicated a tolerance of sloppiness and should be deleted 
Many of the chairmen reported that their groups had felt strongly 
about the above statement They realized that social writing nor- 
mally doesn’t stress mechanical correctness, but they didn’t feel 
that mechanics should be downgraded 10 

There are few educational subjects about which lay people might be ex- 
pected to speak with such confidence as their own language For this reason 
and also perhaps because it is difficult to accept the suggestion that the 
endless hours of spelling, punctuation, and usage drill in English classes may 
not have been important in helping one to communicate adequately in a 
social situation, the lay people saw the wording of the objective as a potential 
threat to the clanty of their language More fundamentally, however, while 
experts may stress the authentic quality of interpersonal communication over 
what they call the “artificial” standard of “correctness for correctness sake,” n 
lay people ha\e good reason to recognize that the social acceptance of an 
imitation, a thank >ou note, or a formal address to a group may ultimately 


10 Summary of Conference of Lay Panels p 6 
11 A 'AEP Writing Objectives, p 10 



THE SUBJECT-MATTER OBJECTIVES 


53 


depend on its conformity to social conventions, rather than on what is really 
being said Language arts experts may hope to change this basis for social 
judgment, but for the general public it is a fact of life 

Tyler’s rebuttal and the lay chairmen’s response are summarized as fol 
lows 


Dr Tyler explained that in the assessment, samples of writing for 
a social situation would be collected and they would be judge in 
terms of content, organization, clarity, legibility and mechanics 
What the objective is indicating is that a personal letter is to De 
judged more in terms of what it says to you as a person than on 
the accuracy of the mechanics However, the chairmen e 
clear that the assessment should not be saying that t e me 
of writing is unimportant 12 

The final form of the objective does no. seem to have been much mflu 
enced by the lay people’s wishes 

social situations call for ihecommumtanon of thoughts ob 
servations, and facts so organized that t ey . , 

the reader, correct spelling, Ol 

not essential to the effectiveness of most social coin ^ 
course, correctness is hoped for he ' e a , J, on ma tures, but so 

ter” performance anticipated as P P accU racy, orgam 

cal commun.cat.on relies principally on factual accuracy, 
zation, and flavor 13 

Ironically, more lhan five yeais laler, NAEPwasforced 
rate post hoc report on the Assessment r«ul ^ ^ , he EC S Advtsory 

reason for this was continuing p Assessm ent in Writing had been 

Council, who became mvolved after ^ ^ ^ counterpa „s m 

Social Studies ECAPE tooh 

the Cardinal Pnncple and had ^ ^ s()CiaI ra ,her than on the 

on competence in a broa fie j d j n most schools across t c coun 

comm 0 n^school^meamng^*^^^^^_ 

»*-*■ * a ’ / rz f X " bp6 

13 NAEP Anting Objectives P 



54 


A STUDY OF THE NATIONAL ASSESSMENT 


knowledge of facts in history, geography, and current events dominates the 
field of social studies, and the lay people wanted this reality to supersede the 
broader aims of the original objectives They wanted new ones which would 

aSS ess the degree to which there is basic subject competence in 
such fields as history, geography, economics, political science, so- 
do ogy Then, from these subject areas, objectives could be 
developed to assess the child’s or adult’s ability to draw up, stimu 
late, utilize, interrelate these many frames of reference 14 

Three months after the meeting of the lay chairmen, a social studies con 
ference was called at the Center for Advanced Study in the Behavioral Sci 
ences in Palo Alto Contractors applied for the redrafting of the Social Stud 
les Objectives, partly as a result of the lay recommendation Subsequently, 
ETS recontracted to produce new objectives which were sent to one member 
of each of the original eleven panels which had reviewed the earlier objectives 
the previous fall 

Perhaps the original Social Studies Objectives were unsuitable for reasons 
other than their distance from the actual classroom, the Objectives booklet 
suggests that they were changed because they were not behavioral enough It 
does not seem, in any event, that the experts m this instance were any more 
consistent about applying the Cardinal Principle than were the lay people 

Music In the case of the Music Objectives, both lay people and professional 
musicians questioned different aspects of ECAPE’s formulation, and the two 
groups were answered by a shifting application of the Cardinal Principle 
e ay panels demurred at the inclusion of music in the Assessment precisely 
on the grounds of whether or not it was a significant cumcular goal in the 
sc ools Tylers response does not deal directly with their hesitancy, but 
stresses ECAPE’s reliance on the expertise of advisory panels 

Dr Tyler pointed out what he felt was the philosophy of this 
assessment If the [advisory] panels say that these are good things 
o go at, and this would be true in mathematics as well as music 
or art, the students won’t all get to the same le\el If music is 
important enough to be included, we would report such things as 
a great many people ha\e learned to sing and to get satisfaction 
rom it We are saying that this kind of information is useful 
i.? US V Ve music ,s something that is desirable to learn, 

oug , as with mathematics we aren’t trying to say that esery 
one should get to a particularly high le\el ,s 


Summary of Conference of Lay Panels p 7 
,s Ibid , p 7 


THE SUBJECT-MATTER OBJECTIVES 


55 


Although it is reported that the lay panel chairmen then arrived at a consen 
sus that an assessment of music would be valuable, many of them went on to 
express their dismay that jazz and contemporary music had been omitted 
They felt strongly that these should be included Ultimately, as reported in 
the Music Objectives booklet, these latter forms of music were not included— 
although professional musicians who had been consulted had also requested 
them The reason given was that they were not being taught m the schools' 

The Cardinal Principle’s vagueness and inconsistent application not only 
blocked the opposition of lay advisors and some professionals to various o jec 
tives, but also supported the always prevalent conception of objectives as 
being “objective” or “value-free " The description what the schoo s a 
tempting to achieve” promised that an objective amalgamation o goa 
and would emerge This is, of course, impossible But rather than squarely 
confront the matter of whether some choices were inevita e, an 
problematic matters with the laymen and others mKrest “ statcmcnts ’ 

ECAPE obscured this very complicated issue Tylers tflp „ 

merely implied that not all the goals of the schoo s were o ^ , rac h 

when he suggested that the objectives should be those w ic of 

ers, and curriculum specialists believe faithfully re ec i 
that held and which the schools are seriously -*•« » 
ignored here (although hinted at by the wor ^ lrom the vast 

some people are going to have '° n ’*^, j ' a „d ' traditional,’ “di* 

array of “unpopular” and popular, P « . a , educa tion will 

erse” and “uniform,” and “stated an a ^ lassrooms 
be considered the real objectives of the country should resolve these 

For the moment we can judge ECAPE’s view * were ^ 

issues by considering its actions The lay P co P , t hem critically after 

ered unqualified to delineate the objectives an ^ drew up , he 

they had been formulated by the experts ^ omlt raK hanics 

original Social Studies Objectives and those wh d cJml P„n 

from .he Wnr.ng Objecr.ves were - « * the ECAPE 

a,nma ° a were 

leadership and other 

ultimately those which were original constituencies— sc ° 1 ’ 

I. would appear, .hen, advoeales for .heir 

educator and lay wh J COI ,n,«s among their -«**“ 

mreresrs and m wavs ,ha p™ino,e,. lack of clari.y and mcon- 

arise, they were ban "^'of detent persons were 
reney Over WIthl „ ECAPE took ptaceAmughmen.* ^ 

and most common „port» o[ „hat olhers had said an 

susoects, through secon 



56 


A STUDY OF THE HA TIONAL ASSESSMENT 


meant The entire process was presided over by the contractors, EGAPE staff 
members, and the symbolic notion that somewhere in public practice was to 
be found a key to what constituted appropriate objectives 

This might be called a process of pseudo participation, one which follows 
the letter of the original stipulations but which ultimately involves no one’s 
deeply held commitments It seems likely that parents’ aspirations for their 
children and for their society were never brought into real dialogue with 
teachers’ goals nor with the objectives that might emanate from the cutting 
edge of scholarly work This being the case, the results might be expected to 
be rather disparate lists of objectives which would safely conform to a gener- 
alized notion of the nation’s educational (read “subject-matter”) objectives — 
in terms of some unidentified, rather hazy, and certainly distant point of 
view Vague goals or extremely specific items, none deep enough to provoke 
dissent or dismay from too many people, and mostly unrelated to any coher- 
ent perspective on the subject area or students’ developmental growth, could 
have been predicted This is indeed what the final objectives, with certain 
exceptions, appear to be 16 


THE CONTEXT FOR THE DEVELOPMENT OF THE 
OBJECTIVES 

EGAPE s original hope — that the objectives would be meaningful and ac 


Obviously, no general statement can apply equitably to all the subject 
matter objectives, and it is worth mentioning briefly some of the variations in quality 
which we have found among them The quality of the content and presentation of the 
Literature Objectives seems very high, and the Music Objectives imaginatively ad 
dress some very difficult problems Of all the objectives, those in Social Studies and 
Citizenship are the most confused and would seem to require intense concentration 
The Reading Objectives, according to specialists whom we consulted, are excellent in 
contrast to most of the other objectives, their hierarchical arrangement is especially 
sound and developmental^ appropriate It should be noted, however, that since the 
Assessment begins in Reading, as it does throughout, with nine year-olds, the difficul 
ties of setung objectives for the years during which children learn to read could be and 
were avoided 

Special attention must be drawn to the objectives in what came to be called * Ca 
rcer and Occupational Development " Here the criticisms we have made of the pro- 
cess by which the objectives were developed are not applicable Competing contrac 
tors with different approaches to this difficult area were involved in lengthy 
deliberations with professionals and lay people The resulting objectives and their 
presentation are a product olftce years of work, indicating that, had comparable time 
and energy been expended on other areas, the objectives might have been vastly 
improved 


THE SUBJECT MA TTER OBJECTIVES 


57 


ccptable (o all three of its constituencies — could have been called naive even 
in 1964, given what appear to be the realities of the process by which the 
objectives were formulated and in light of contemporary developments and 
interests, such a notion seems woefully simplistic 

For one thing, interest m unearthing the hidden curriculum and its un 
stated objectives has since produced justified skepticism that explicit state 
ments of educational purposes are meaningful indicators of what is really 
taking place or what was really intended to take place Instead as Robert 
Stake has pointed out, statements of objectives are often really formal trans 
formations of values” by v *uch personal or institutional preferences and bias 
es are obscured Secondly, although most of the final objectives and subobjec 
tives fell short of the original intent to formulate behavioral objectives 
ECAPE’s committment to statements of behavioral goals— very much in tune 
with the effort to make education a scientific endeavor— has never wavere 
even though its accomplishment would be quite problematic Now 
demand for accountability hits many classroom teachers as well as statew.de 

administrators behavioral objectives continue to be taken as t e y 

Another phenomenon casts doubts upon the probability that the 
Assessment’s objectives can succeed on then original terms ComemPorary 
crit, asm of the schools extends to the assumption and praet.ee of ' 5 

educa, tonal a, ms and methods .ha, have long been -PPortcd by P™^'»“ 
educate Plumbs, tc hfe sty.es have been increasingly •***££„ 
celebrated in soctety, and pressure has been put on h ' 
accommodate them A tnulttphcy of purposes must be reflected 

^Another major element of the original sun. 5 of 

to question— the simple definition o t e subject matter areas 

objeettves from ten scpa™“ .‘"llTbcen “nously challenged within •!>= 
This definition seems never _»h ^ ^ have seen one lay panel 

policy making meetings o . Jn , on . , hc disciplines might be 

raised the possibility that interre f[cr lhc dubious decision to 

reflected ,n the objectives I. - number o[ d ,s„„c. subject 

are^it — r ^ ,SC1 ^ " 

„ A much needed m "£“ 

haps a reversal ^‘‘"' ^“^overview of enter, on ’,S«» 

any md.cauon Ind“ded “ ha j len ge to the enure process of *«''■* Su «, 
and impressively considered ^ ^ Gene R Ha- - ^ 

according n> g* « “L„eed Tesung A— ■» ^ 

University Oriten 

ary 1973 



58 


A STUDY OF THE NATIONAL ASSESSMENT 


nary objectives that reflected a more sophisticated concept of knowledge and 
the learning process 

There seem to be three majorjustifications for the ECAPE/NAEP decision 
to fragment the development of objectives into discrete sutyects First of all, 
the inception of the Assessment coincided with a renewed classicism, stimu 
lated by an intense interest in the state of the public schools and in an 
attempt to improve the academic preparation of the nation’s youth Such 
concern was expressed in several quarters, notably by professors in prestigious 
universities who were dismayed at the incompetence of entering students 
admitted in increasing numbers from public schools, and from those like 
Rickover, who were fearful that America’s defense capacity, shamed by the 
appearance of Sputnik, was being jeopardized by poor training in the sci 
ences 

The push was on to halt the “dilution of knowledge ” Top educational 
professionals, many of whom participated in the Assessment, joined with 
scholars to draw up more rigorous curricula— first in science and math, but 
later in social studies and English — and to lure bright college graduates into 
the teaching profession The ambitious curricular projects were zealously and 
expensively undertaken, buoyed by Jerome Bruner’s dictum that if energies 
were channeled toward finding the appropriate forms, any subject could be 
taught to anyone of any age without losing the integrity of the subject itself 
In the wake of the pedagogic fervor and optimism of the late fifties and early 
sixties, perfecting the teaching of the subject areas in the schools was perhaps 
as broad an educational goal as those educators who planned the Assessment 
could envision 

Second, the division of the objectives into subjects emanated from the de 
sire to formulate the Assessment as closely as possible in the conventional 
terminology and principles of the schools As noted earlier, school people 
themselves had to be convinced that the Assessment was, on the one hand, 
nothing at which to take alarm and, on the other, something that might 
benefit them If a major objective to be evaluated had been “to formulate 
hypotheses about the causes of an event’* — which would have included train 
mg in science, social studies, and literature and could have involved content 
from any of these fields — the role of curriculum specialists advising the As 
sessment would have been much more complicated than under the tradi- 
tional scheme If, as one lay panel suggested, oral and written study and use 
of language had been seen as part of a general topic such as “Commumca 
tion or Language Am,” reading and literature specialists might have com 
plained that their own domains had been trespassed upon 



THE SUBJECT MATTER OBJECTIVES 


59 


For instance, who would be held accountable if the Assessment were to 
report that 60 percent of the nation’s thirteen year olds could not find their 
way from one point to another on a map of a hypothetical city 5 Successful 
execution of this task calls for a complex act of knowing and while most 
teachers would deplore such a failure and most lay people would blame the 
schools for it, the skill itself would probably not be listed in any conventional 


list of objectives for any particular subject area 

ECAPE’s consideration of objectives manifestly did not begin with the 
needs for certain skills nor with the question of priority among various skills 
abilities, and understandings It began with the subject matter divisions and 
with the contours of knowledge which each discipline had traditionally fash 
loned The resulting objectives do not suggest adequately the relations ips 
among various fields although in some cases the objectives include abilities 
which are fostered and perhaps even initiated in other fields n Science for 
example, critical reading to determine what is relevant or irre eva 
hypothesis fact, or opinion, would see™ to have a close conneclion with 
similar activities in Reading, Literature, and Socia tu les 

A third source of .he ECAPE/NAEP conception of <*«**““ * 
wh.ch may exp.am the unearned ha. the 

Assessment, was a feeling among some of I .he ^ » (jc , |ltatc lhe de 
development of objectives was merely , nterV iews for this report it 

velopment of exercises by the contractors ring Drcva ,i e d during the 
became dear that this relatively simple u « lltar,an ^ on the NAEP 
Assessment’s formative stages even thoug some i f obje ctives 

staff now claim the delineation of “her than jus, a 

booklets a, one of the Assessments major achievements, 

means to create and validate the cntatlo „ of each subject area in 

A related set of criticisms «’” ce ™ s ? ot Literature and Mu 

the individual objectives booklet i mdlvldua l objectives ro a 

... .^rrelv any attempt to relate i 


the individual oujo... . , he list of individual 

s,c there is scarcely an y attempt to relate. 

TTZhubeen offered recently by consult 

1B A competing view of the “j’’ meaningless and instead insist 

anu to NAEP They °, J d “ed as .he subjec. J ,h7«e res 

that the exercises sh °" ,d . r ot application of knmvfedg . eq that ,f 

irr, r. p T - — * ^ 



60 


A STUD Y OF THE NA TIONAL ASSESSMENT 


conceptual understanding of the field — how it was perceived by those who 
defined the objectives, what its importance might be for those who participate 
in it, and what relation it bears to other aspects of human activity Thus the 
objectives, no matter how commendable, seem to exist de novo, as if m total 
isolation from preferences, values, or perspectives on each subject Within 
almost no area is there even a minimal attempt to point out which objectives 
might be considered primary and which secondary — and for which groups of 
the population While it is occasionally indicated that one objective is related 
to others and that some may be more important at some ages than others, 
there is no presentation of a whole and how its parts fit together theoretically, 
empirically, or developmentally 

It may be objected that the objectives were designed to be simply that, and 
not complex statements about knowledge and cognition Had attention been 
paid, however, to what Paul Hirst has called the relationship between each 
discipline and “the development of mind and the nature of knowledge,”’ 9 the 
objectives— even within each subject— might have been more systematically 
organized to present a more coherent view of each area Within each it also 
would have been possible to consider the question of what is essential and 
abiding within the discipline and what is determined by the historical or 
cultural context If such a consideration had been made, it might have been 
possible to respond to particular requests by the lay review panels, using 
broader procedures for ascertaining how the objectives could change and to 
what constituencies NAEP would be responsible In Literature, for example, 
the request that non-Westem literature be included could have been seen as 
an expression that the fundamental inquiry of literature takes place within 
particular and individual works whose salience to the individual depends 
upon his or her age, cultural milieu, purposes, etc The popular appeals for 
inclusion of mechanics in the Writing Objectives could have been dealt with 
more adequately than by a belated re analysis of the writing exercises and 
the publication of a special report on an objective that had originally been 
excluded 

Because such fundamental issues as those outlined above were given little 
attention, and were not taken into senous account in either the development 
of the objectives or in their presentation to the lay consultants, the way in 
which they are now offered to the public is quite superficial Various essential 


15 Paul H Him “Th c Contnbution of Philosophy to the Study of the Cur 
nculum,’ in Kerr, op at pp 39-62 9 ! ^ - 

MLSU- CENTRAL LIBRARY > 


62 


A STUDY OF THE NATIONAL ASSESSMENT 


objectives in different subject-matter fields is highlighted by this 
procedure, thus suggesting possible points of integration 22 

Throughout the Taxonomy is an impressive concern that it be useful to 
practitioners at every level Here teachers accustomed to uttering global, 
vague objectives, such as that their students should “really understand” or 
grasp the core or essence of a thing, were encouraged to inquire what they 
really meant 23 

EGAPE might also have taken heed of the memo from John Flanagan of 
Project TALENT, to which we referred on page 21 It was dated April 14, 
1964, and titled “Evaluating the Effectiveness of Elementary and Secondary 
Education The Need for Objectives in Terms of Behavior” 24 This memo 
was provocative but pessimistic, Flanagan’s description of objectives was im- 
bued with a profound awareness of the relationship of learning to the needs of 
the individual, but he was doubtful that existing instruments could ade 
quately evaluate the attainment of such objectives What might have been of 
particular interest to the Assessment’s goals was his insistence that behavioral 
outcomes could be designated for the “progress which each student has made 
along the unique pathway leading to his or her self-realization ” Such an 
approach was, he argued, consistent with the general aims of education, usu- 
a y agreed upon as developing the individual’s maximum potentials and 
producing free and responsible citizens ” 

Flanagan added an opinion about his work which ECAPE might have at 
least considered 

In accordance with most recent thinking in regard to public edu- 
cation these statements refer to individuals and not to fields of 
knowledge or disciplines 

The growth of the individual thus takes precedence over the specific content 
w ich may further it Flanagan went on to say that AIR (American Institute 
esearch) had developed procedures for describing the individual’s needs 

terms of behavioral objectives, although satisfactory means for assessing the 
outcomes of responsible citizenship or individual fulfillment through voca- 
tional efforts had yet to be devised 


22 Bloom, op cit p 22 
n Ibid , p 21 
24 Carnegie hies 



THE SUBJECT MATTER OBJECTIVES 


63 


in spite of the emphasis by scientists on such goals as appreci 
ation of science understanding the scientific method the need for 
rational decisions and the fostering of ingenuity and imagination 
these outcomes are represented either not at all or in very made 
quate fashion by present instruments The outcomes that are easy 
to measure such as knowledge and application of prescribed or 
mulas account for most of the questions 


Perhaps it would have been impossible for the Assessment to make substan 
tial use of either Bloom s taxonomy or Flanagan s perspective The commit 
ment to separate subject areas came early and was sensible because it wo 
be readily accepted by professional educators And viewing t e J 
from the individual learner s point of view must have seemed somew 
tant from the purposes of a national assessment in which large nul " 
people were to be reached and for which one of the early hopes was compar. 
son among states or even local school districts attempt to 

Within the limitations of subject mailer objectives howe 
make distinctions among objeclives-which was of 

but no, ,n others might have produced a more —gful presen ^ ^ 
each held serious attention to Bloom s wor co have guided 

eating the cognitive processes at work on each eve p0 |,„cal 

,he designers "of the exercises as wel, Once the objec 

reasons designed to make comparison Modal types of 

„ves might also have gained deplh J* have 

individuals from many situations an described and 

been posited edocal.onal objectives for them ^ZuZ The numbers 
appropriate mean, designed to assess their progres^ ^ ^ NAEPs 
of respondents would have been deve l„p,„g instruments 

substantial resource, could have bee '’ Iacked , he funds to develop 

that Project TALENT according to Flanagan 





64 


A STUDY OF THE NATIONAL ASSESSMENT 


the Assessment could begin, this precedence was one of practical procedure, 
not of priority 

Yet, as noted briefly above, in later years the objectives have assumed 
major importance All the booklets describing them to the public contain a 
statement almost identical to the following one, which is taken from the 
Introduction to the Music Objectives 


After objectives for Music and the other assessment subjects were 
initially developed, they were compared to other statements of 
o jectives in these areas which had appeared in the literature 
during the past twenty five years It was clear that National 
Assessment had not produced “new” objectives in any subject 
area Rather these objectives were restatements and summanza 
tions of objectives which had appeared over the last quarter of a 
century Tim was a desired and expected outcome in that one 
criterion of National Assessment objectives was that they be cen 
tral to the teaching efforts of educators 

.The job of developing objectives has not ended, however For as 
the goals of the educational system evolve and change, so must the 
objectives used by National Assessment likewise change This 
means that there must be continual re evaluation of the objectives 
in each National Assessment subject area 

By providing this continuing process of re evaluation, the Na 
tional Assessment program hopes that it can attain its primary 
f i ° P 1 ^ 1 In £ information on the correspondence between 
educational system is attempting to achieve and what 
in fact, it is achieving 25 

The contradictions in this excerpt are significant for what they reveal 
ut t e am iguities in the objectives First, satisfaction is expressed that the 
j tives within each subject area are virtually identical to those already 
^ th 31 'I 5 m , * C ^ ltcrature °f the past twenty five years That satisfaction 
n c agnn is articulated at this outcome to a lengthy and expensive 
p ocess may indicate NAEP s wish to assure educators that the Assessment 
. * reatcn curr ent practice The assumption here is that continuity in 

( J . m . at,Cr °hj ect »ves occurs because what is “central to the teaching ef 
fom of educators does not change, has not changed 

’ * S rcco S n,t, on and celebration of the traditionalism of the objec 
is signi leant because it is inaccurate and misleading — largely through 
ions w ic were partly intentional and partly the result of insufficient 


* NAEP Mt 


r usu Objectives p 8 



THE SUBJECT MATTER OBJECTIVES 


65 


understanding of the latent functions of the delineation of subject matter 
objectives Clearly in some areas the Assessment is creating ' new objectives 
in a variety of subtle and not so subtle ways At a general level the emphasis 
on national educational objectives itself is a significant force in the newdefi 
mtion, or renewed definition, of educational objectives at the state and local 
level Old objectives may remain unchanged, but if they have been virtually 
moribund their revival constitutes creation of new objectives More specificai 
ly, and more discreetly, the Assessment developed new objectives in the way 
in which it chose to define the field of Career and Occupational Deve 
opment, and in the very creation of the new subject area of Citizens ip 
well as in its handling of some of the traditional subject areas In nting 
there was the clear attempt on the part of the contractors and Assessment 
staff to deemphasize the importance of grammar and other aspects of wn mg 

mechanics Surely, whether one favors such a deemphasis or not it cons 

a new definition of the objectives of teaching or learning \ rlt J ted 

The statement included m each of the object, ves bookled 
above explicitly contains a tehee and .mportan, con.md.cdon toned, V 

over the past quarter century is the asset, p ^ of c | a „ n th rn 

matter objectives are likely to change in thTO1 ) reconstitute 

becomes that it will document such c anges ( changes and 

the Assessment s objectives contained ,n accordanc w Uh tee ‘ I ^ 
we may assume, revise the exercises lo measure the aehievem 

XAEP has come lo eous.de, par, o, us — "" 

mous with subject matter obje iv as expanded decision mak 

new educational purpose within the = O ^ inorlty and lemale children can 

mg lor students or stronger se cot P „ a challenging problem 

be translated into separate subject m d (he orlgl „al 

I, » difficult also to -rJircha^fiom^ose of the previous 
objectives— which supposed y changes in objectives 

quarter century-^ould be a decade of assessments Tins 

over each period of the Assess , ^ ^ ^ NAEP Thosc members of ' 

question seems to be std very are working up the Assess"® 

Exercise Development Team . in Den educational objecmes 

second phase speak of the major ^ for „, e mor e encompa* 

which .her » ‘ A " t including Doro.hy Guilford of .he Office 

ing aspects ot tne 



66 


A STUDY OF THE NATIONAL ASSESSMENT 


Education s Center for Educational Statistics, are rather certain that, when 
they are translated into stated objectives, what appear to be changes will turn 
out to be largely restatements of the existing NAEP objectives What those 
who focus on the near terrain take for major crevices and outcroppings may 
turn out to be, when viewed with extended perspective, mere shadows on a 
generally continuous surface 

It might be argued that much of this analysis is critical of the objectives 
from a position outside that which ECAPE was working from, and that polit- 
ical acceptance of the Assessment necessitated a bland and simplified set of 
objectives and a process which would produce them Since this was to be a 
national list of objectives, the argument might continue, only a list derived 
from a defensive stance — to preclude attack from regions and organiza- 
tions would suffice Besides, in 1965, it was bold to involve so many 
constituencies, even if the actual level of involvement was low To begin to 
consider the wide variety of approaches to schooling and even the major 
discrepancies across the country might have made impossible any resolution 
at all 

Even if we accept these excellent arguments, we are left with crucial ques- 
tions What meaning can there be in a national assessment which is so en- 
meshed in compromise and superficiality from its very beginning’ Are there 
ways of ensuring that, once a benign and nonthreatening assessment is 
brought into being and is accepted, it can be transformed into a more pro- 
found assessment’ What would a more meaningful set of objectives include 
and what would be its purpose’ Further consideration of these questions and 
issues must await the remaining sections of this book, which are concerned 
with the National Assessment’s management, politics, and utility 



5 


the ASSESSMENT’S 
EXERCISES 


Once NAEP had developed subject matter objectives its 
next task was to create the exercises to be used in evaluating achievement of 
the objectives NAEP established a number of challenging criteria the exer 
cises had to meet but as we shall see there were challenges that NAEP did 
not anticipate or acknowledge and these led to a variety of problems 
Certain of NAEP s criteria were given special emphasis First each exer 
cise was to have content validity that is each individual exercise was to tnea 
sure directly and clearly achievement of one (or more) of the behavioral 
objectives Second equal numbers of easy average and difficult exer 
cises were to be written Some exercises were to be so eis\ that 90 percent of 
the respondents would answer them correctly (the so-cnlled *>0 percent exer 
cises) while others were to be so diff cult that onl) 10 percent would answer 
correctly In this way the range of abilities skills and understandings o l the 
population would be quite fully described Finally the exercises were not to 
be restricted to the traditional multiple choice format but were to take what 
ever forms (eg interview essay or task performance) would be most appro 
priate for the behavioral objectives involved 1 

To prepare the exercises NAEP contracted with the agencies that had 


'For a complete list of the exercise development criteria see Carmen J 
Finley and Frances S Berd e The Net onol Assessment Approach to Exerase Development 
(Denver NAEP 1970) p 15 


67 



68 


A STUDY OF THE NA TIONAL ASSESSMENT 


written the subject matter objectives, but the exercise development criteria 
proved difficult to meet In part, the problem was due to the contractors’ 
inexperience with the kinds of exercises NAEP sought, the standardized tests 
with which the contractors were then familiar did not emphasize content 
validity of items, nor did they make use of 90 percent exercises Also, in part, 
difficulty arose because of NAEP’s poor management and planning of the 
exercise development process Most important, the exercise criteria them 
selves contained several unresolved predicaments concerning the relationship 
of exercises to objectives, and these continue to make the writing of Assess 
ment exercises a formidable task 


THE EXERCISE DEVELOPMENT PROCESS 

There is little evidence that either the Assessment leadership or the con 
tractors were aware of the magnitude of the exercise development effort re 
quired The Assessment’s initial plan required that all exercises submitted by 
the contractors be reviewed twice, fust by lay people to ascertain that no 
exercise was potentially offensive to any large group of people, and second by 
experts to verify that each exercise did indeed evaluate the educational objcc 
live for which it was intended and that it had no identifiable flaws’ The 
entire process was expected to take a year and a half Before work was com 
p ete, however, at least three other major lay and professional reviews were 
undertaken, as well as a myriad of in school and out of school feasibility 
studies and pretests More than three years was required 

The first review provided for in the orginal NAEP plan was the review by 
ZTU ^ t * 1C exercises developed by the contractors by spring 

J , a rout 1 ,200 were judged by the Assessment staff to be sufficiently con 
troversia to require lay study These were discussed at conferences attended 
by representatives from organizations including the American Association of 
University Women, AI L CIO, NAACP, National Conference of Christians 
and Jews, I I As, school board organizations, and the U S Chamber of Com 
mcrcc Alxmt 100 exercises were dropped as a result of the review, primarily 
in ailcraturr, <c idirip, and Social Studies Some 200 Citizenship exercises 
were revivd 1 r 


u baled up.“'i'X'«d IWd'le' 1 ',",' T"' " "" ,hr pr " c “ 

’’ 1 f 'he or rcvltrd Jec Fin|(y ,„ d 



THE ASSESSMENT’S EXERCISES 


The other review included m the original plan was a mail review by sub 
ject-matter specialists About 70 percent of the 21,000 exercises developed 
were submitted to reviewers, the remainder having been eliminated by the 
staff or the lay panels Three to five reviewers for each subject area were 
selected from nominees proposed by major professional associations in educa 
tion, and each exercise was studied by one reviewer 4 While the review pro 
vided a wealth of commentary, the contractors who had written the exercises 
were perplexed in interpreting much of what was said or reconciling conflict 
ing views Reviewers often asked how particular exercises fit into the broad 
picture Each reviewer had been sent only the objectives relating to the spe 
cific exercises he was to review 


One problem about which the reviewers agreed, however, was that consid 
erably less than 90 percent of the population would respond correctly to the 
“easy” exercises As a result, several studies were commissioned to determine 
how easy the “easy” exercises in fact were About fifty easy exercises were 
administered to respondents from high and low socioeconomic backgrounds 
for each of the ten subject areas As expected, the responses s owe a 
“easy” exercises could be answered by the hoped for 90 percent There were 
clear difficulties in communication, vocabulary, and format As a resu 
study, contractors were asked to wnte new, easier exercises 
Follow, ng the ma,l rev.ew several conference, were held and 
continued to cotne across enbs.an.ta. probletne Some rev, ewen empla ned 

tha, the exercises merely ■'scratched the surface", others foun 
booktsh, pedantic, untmagtnanve, and somenntes imprecise o * pi™ 
correct Most consultants agreed the vocabulary was too dtfhcul. 
exercises reflected an exclusively middle class bias 1057 a nd June 

Additional conferences were arranged between This 

1968 to evaluate the contractor’ rev stons following „ was agree d 

time it was felt that stgmftcant progress bad been m , , J nd 

that ftve subject matter areas-Wrrnng Cmaensh.p bfcramr 


Jerdie op cit , p 40 from the International IRrad 

* Nominations for participants were ret ' Nat ,„„al Council of Teach 
ng Association, National Council of Teachers S x „ ch er Educators National 
Jo, Mathematics, National Association of Tndunmd Amen»n V«a 

icence Teacher Assoc.al.on, “ N an„„al 
lonal Assoc, anon. National ™ Educator Nauonal Couference, 

ial Studies, American Historical A® 0 ® 
ind National Association of Schools o 



70 


A STUDY OF THE NATIONAL ASSESSMENT 


Social Studies — were suitable for final pretests and lay review The Science, 
Writing, and Citizenship areas were selected for the first round of the Assess 
ment 

At this point there remained the matter of reducing the exercises m each 
area to a number which could be administered in the time available — 160 
a SS re § ate d minutes per subject area Once the professionals and lay people 
were done with all their reviews, there remained about 181 minutes m Citi 
zenship, 348 in Science, and 224 in Writing The contractors then rank 
ordered the exercises on the basis of quality, reportabihty, and balance across 
objectives and difficulty levels, and the Assessment staff determined each 
area’s final set of exercises 

Overall, there seems to have been a minimum of effective planning and 
control by the Assessment staff There is no evidence of systematic scheduling 
of contractor activities, time allocation (“pert diagramming”), or clear deci 
sion points Special studies appear to have been undertaken in a strictly ad hoc 
fashion, without considerations of careful design so that maximum informa 
tion might be obtained at minimum cost Nor is there any evidence of a 
precise formulation of the desired goals of the exercise development program, 
i e , in judging the quality of the exercises, “How good is good enough? ’ 

NAEP has made some effort to remedy certain of these ills For example, 
disappointment with the exercises has led the Assessment staff to rely less 
heavily on outside contractors and to establish increased internal exercise 
expertise In addition, the staff has concluded that the system of checkpoints 
along the exercise writing path was insufficient Because reviewers often eval 
uated exercises only when they had reached their “final” form, many had to 
be rejected, representing a considerable investment m time and money For 
this reason, NAEP established a five phase exercise development process for 
future rounds (1) development and review of objectives and prototype exer 
cises (2) preparation of exercises, (3) review and revision of exercises (4) field 
testing and revision of exercises, and (5) final review and selection 

The major alterations incorporated in the new five phase program occur in 
phases 1 and 2 During the review and revision of objectives (phase 1), a 
group of subject matter specialists prepares a small number of complete pro- 
totype exercises designed to explicate the nature of the objectives and serve as 
models for later exercises When the objectives and prototypes are complete 
phase 2, the actual writing of exercises, begins and involves two stages One 
half the desired number of exercises are completed first, to be evaluated by 
the Exercise Development Advisory Committee or Assessment staff Only 
when any problems uncovered at this juncture have been solved are the 
remainder of the required exercises written The review and revision pretest 



THE ASSESSMENT’S EXERCISES 


71 


and final review (phases 3, 4, and 5) are similar in method to that utilized for 
the first round, embracing both lay and subject matter reviewers The entire 
process consumes about three years 

While these steps may be wise, they are also only a beginning Technical 
problems, including large questions about the relationship of many of the 
exercises to the subject matter objectives, indicate that adequate exercise de 
velopment requires an effort characterized by a degree of comprehensive 
planning and imagination so far not evidenced The primary weakness of the 
exercise development process can perhaps best be analyzed as a failure to 
establish an effective research and development organization Manifestations 
of this failure have been enumerated above insufficient understanding of the 
magnitude of the task, minimal overall scheduling, and inefficient utilization 
of contractors and other resources 

In part, the trouble may have arisen because of the novelty of the project 
but on the other hand, such novelty ought have been expected to inspire 
unusually high management ingenuity 


TECHNICAL ASPECTS OF THE NAEP EXERCISES 

Many of the exercises used the Assessment’. ftrs. 
standards that NAEP itself set for them disregarding ^ ukr , tcm 
more stringent criteria that might have een emp o cxcre ises have 

analysis indicates that along several important d, ™ e " su bobjectives 

an extremely poor h, w„h ***%£££££ £ Lt J ft. are 
which they are supposed to measu Citizenship Reading and 

analyzed below All released exercises m Sc ence P , he 

Literature were reviewed, and those cued below are , mended 
range of problems and prospects 5 

Distribution ol Exercise, by Objectives and Subobjectives 

The NAEP behavioral objectives The objrcl.ves organize 

cast as objectives and visions, and the subobjectives part.cu 

the subject in terms of » lew la 5 ____ — 

s Not all exercises used - — jSTSU** 
pubbc Abo., 50 percen, - may 

exercises are representative* * „ emse release policy. *e Chapte 

ty For further discuss.cn of NAt. 



72 


A STUDY OF THE NA TIONAL ASSESSMENT 


T: TT” m0re ° r dCtai1 Som ' um “ the objectives organize 
Ac subject on the has. of content, for example, ,n Writing, one objective 
concerns bus, ness communication, and another scholasnc writing Sometimes 

invohT’p 3 ^ “ baSCd ,hC k,nds °' Pledge or ab.ht) 

ITow 1 m Sae "“’ ,h "' are ,OUr ° b J— es the first cote, 

und S d° *T and PnnClpl “’ **“ xcond , shills and abihties, the third, 
und nundmg of the methodology of science, and die last, appreciation of 
the role of saence and saentists m society 

frc^onTt’" ^ related to each objecuve vanes considerabl) 
from on objeenve to the next For example, objective I m Science (Knots. 

Si pn " a P lK ) twenty-nine, one involving elecmat, 

htlmrnof ’ an ° th " ' n ' olv “'S -o’ooon, objective I m Literature (Read 

- Objeenve I m Literature and „s related 
subobjectives are shown in detail m Figure 1 


FIGURE I 

LITERATURE OBJECTIVES 


I Read Literature of Excellence 

brSi rSdLfT u ng ’ SOal d ™ds of “o -"dividual a 

str.tf - “ - 

Age 13 (in addition to Age 9) 

au,ht>r5 2n d works (Aesop or La 
Fomame, Andenen or the Brothers Gnmm, Tht Jtmgl, 

tt'TZlT"’ Cha ' k ‘ Ut ’ S W,b ■ "poet's Booh of Amen 

Mutter rcd No> “' ^ sandbu ^ w 

Age 17 (m addition to Age 13) 

nineteenth ^ ^ P^^S^ of Shakespeare, major 
Pot2^w^ Ur> ntnC,ms < En S llsh and American), 
anro.h:a t '" 1, ' ,man ' Fra «- E E Cummings, Kean 

AdU ‘‘ SuL na ru' a a d p— 0n "'fearsof forma, 

same as th ^ dc ^ n,t, °n of goals is approximate!) the 

attrition from ^ AgC * 7 ' ,f ° nc 5731115 1,10 h^^nce of 
n memor> and addition from experience 



THE ASSESSMENT'S EXERCISES 


73 


B Understand th, basic mtaphm and thcmts through which man has 
expressed his lalues and tensions in Western culture 
Not unlike IA, this goal calls for an individual’s knowledge 
of the major texts and literary or cultural figures and themes 
°u ^ estern cu ^ ture This knowledge constitutes a cultural 
shorthand, by which one may recognize similarities between 
the past and present, by which one may recognize certain 
umversals, be they prototypes like Oedipus, symbols like the 
blind seer, or themes like the struggle of Job to understand the 
nature of divinity The end of this goal again like the end of 
goal JA, is the ability to use this knowledge when confronting 
a new situation, either m literature or in life 
Age 9 Know some of the common Biblical figures (This goal 
is assigned to this age group although it is hard to 
predict when where, or if this knowledge is acquired ) 
Know about the Arthurian legends, a few of the Greek 
myths American folk figures (Paul Bunyan Pocahan 
tas) 

Be able to recognize the use of these figures in a mod 
ern context (a work of literature a sentence, a slogan, 
or a trade name) MB This goal is applicable at all age 
levels 

Age 13 (in addition to Age 9) 

Know most of the common Biblical figures 
Know most of the Creek pantheon and such legends as 
those of Jason and Odysseus 

Know the Arthurian legends Robin Hood several 
American figures (Tom Sawyer Ichabod Crane Rip 
Van Winkle) 

Age 17 (in addition to Age 13) 

Know certain of the major characters of European 
English and American literature (Hamlet, Captain 
Ahab, Don Quixote, Gargantua) 

Know the themes of certain Greek works ( The Odyssey) 

Know certain post Christian themes (Faust Arcadia 
Utopia and ideals) 

Know certain American themes (. Huckleberry Firm Moby 
Dick) 

Adult As for Age 1 7 but with somewhat more sophistication 
and at the upper levels more knowledge The college 
educated adult might be better able to understand 
fob Oedipus or Antigone, or any of the archetypal 
stones, simply because he is older (see the introduction 

to II) 



74 


A STUDY OF THE NATIONAL ASSESSMENT 


According to NAEP s exercise development criteria, each exercise should 
measure attainment of one (or more) objectives or subobjectives One fact a 
reviewer of the exercises must face, however, is that the exercises treat some 
objectives and subobjectives more exhaustively than others Certain subobjec- 
tives are left essentially untested 

Table 1 indicates the number of exercises employed and, in parentheses, 
publicly released, for Science For objectives I and II, there are about ten 
exercises per subobjective, while for objectives III and IV, there are about six 
Within age groups, there are two or three exercises per subobjective for objec- 
tives I and II, while only one or two for objectives III and IV Finally, within 
a § e g rou P s > there is about one released exercise per subobjective for objectives 
I and II, and about one-half a released exercise per subobiective for obiectives 
III and IV 

The number of exercises released for objectives III and IV is much too 
small to permit of forming conclusions concerning achievement of these ob 
jectives In fact, the numbers indicate that within age groups, certain subob 
jectives under III and IV were measured by only a single exercise, or perhaps 
by none at all 6 This is certainly an inadequate base from which to make 
inferences 

Perhaps objectives III and IV, which concern attitudes, were considered 
less important than I and II, which concern facts, principles, and processes, 
or perhaps III and IV are more difficult to measure Nevertheless, once a 
commitment was made to include these objectives, it seems necessary that 
each subobjective should be tested by several exercises 

Table 2 gives parallel information for Citizenship The coverage of Citi- 
zenship exercises is more nearly balanced, although objective D (Know the 
structure and function of our government) apparently is given more weight 
t an objective C (Help maintain law and order) or objective H (Personal 
responsibility) 

In both Science and Citizenship, the strict recall objectives (Know the (acts 
an principles of science, or Know the structure and function of our govern- 
ment) receive a large proportion of the exercises 

The coverage of Writing appears reasonably well balanced, but because 
o jectives I, II, and III do not contain subobjectives, it is impossible to per- 
form the calculations earned out for Science and Citizenship (See Table 3 ) 


, , Unfortunately, NAEP publicauons fail to list the Science exercises by 
«u jeem e, t us, cm erage of the Science subobjectives must be inferred indirectly, 
by counting exercises 



THE ASSESSMENT’S EXERCISES 


75 


A balanced distribution of exercises need not imply equal coverage of all 


Age 

Facts 

I 

Process Skills 
II 

Understanding 

III 

Appreciation 

IV 

9 

13 

17 

98 (41) 

74 (28) 

88 (38) 

28 (13) 
27 (12) 

19 (9) 

19 (11) 

11 (5) 

8 (4) 

5 (3) 

5 (2) 

10 (2) 

10 (3) 

5 (3) 

5 (2) 

Total 
Number of 
subobiectives 

345 (143) 

29 

93 (45) 

10 

29 (14) 

6 

30 (10) 

5 


1972), pp 8-11 


Age 

9 

13 

17 

Adult 
Total 
Number of 
subobjectives 


Table 2 

Citizenship Number of Exercise* E mployed and Release^ 


Welfare/ 

Dignity 

A 

Rights/ 

Freedoms 

B 

Law/ 

Order 

C 

Government 

structure 

D 

Participation 

E 

4 (2) 

14 (7) 

7 (4) 

3 (2) 

8 (5) 

4 (2) 

5 (2) 

2 (1) 

2 (2) 

3 (2) 

12 (8) 

11 (4) 

23 (10) 

21 (9) 

23 (10) 

2 (2) 

6 (3) 

11 (7) 

13 W 

10 (5) 

35 (18) 

- — ^ — 

20 (11) 

78 (33) 

32 (20) 


Rationality 

G 


(6) 

JB 


39 (22) 


Persona I 
responsibility 

U 

4 (2) 

2 (») 

2 ( 1 ) 

2 (1) _ 
10 (5) 


3 (2) 
3 (2) 


umber oi 6 — . 

Results for S«. 

C ofCmm°*'V( DcTtvcr ECS ’ I9?2> ’ 3PP ° ‘ nd 



76 


A STUDY OF THE NATIONAL ASSESSMENT 


Age 

Social 

I 

Business 

II 

Scholastic 

III 

Appreciation 

IV 

9 

13 

17 

Adult 

11 (4) 

7 (4) 

4 (3) 

4 (3) 

2 (2) 

5 (3) 

4 (2) 

5 P) 

7 (4) 

4 (2) 

4 (2) 

3 (2) 

10 (5) 

7 (3) 

7 (4) 

Total 

26 04) 

i6 run 

» w 

27 (14) 


1972) pp 11,23,35,44 


Results (Denver ECS 


A balanced distribution of exercises need not imply equal coverage of all 
o jectives an subobjectives Perhaps, for example, some subobjectives are 
deemed more important than others, although NAEP gives no systematic 
mgs Whatever the relative importance of the subobjectives, however, 
certainly more than one exercise should be used to test each of them (for each 
age group), simply to provide some judgment of validity In fact, more than 
e exercise at each of the three difficulty levels should be used to test each 
subobjective 

Finally, there is little reason to think that the number of exerases devoted 
° an o jective should be a function solely of the objective’s importance 
ea , o jectives that are most difficult to measure probably require the largest 
^ ? ex ^ ciscs * to achieve validity In short, formal criteria should be estab- 
lished for balancing exercises across objectives and subobjectives 

Exercises Which Measure More than One Objective or 
Subobjective 

. ♦ t0 criteria, each exercise should provide information 

t e evel of achievement of some objective or subobjective Some exer- 
owever, appear to relate to several objectives at once Thus, while a 
t t T°^ t0 SUC ^ an cxercisc niay indicate achievement of all objec 
c t e exercise relates, it is uncertain what an incorrect response indicates 
For example, consider the C.tizenship exercise. Exercise 1 ’ 


in this bookar* n “ mb « r!l a PP ear Wlthin parentheses below the exercise numbers 
bers inH fr0m thc Assessn,cm ’* testing packages The Assessment’s num 

ben indicate age (13). package (12). exercise (11) 



THE ASSESSMENT’S EXERCISES 


77 


(13 12 11) 


Objectives Support rights and freedoms of all individuals (recognize that everyone has a right 
to have a fair trial if accused of a wrong act) Recognize the main functions of governmental 
bodies (recognize that courts resolve disputes) 

Age 13 Interview Suppose the police arrest someone they think steals things The person 
arrested may or may not be guilty of stealing things What decides if a person is guilty and has 
to go to prison for stealing’ 


(If answer has to do with “evidence” or “confession’ or the like, probe for further understanding 
for example, "Is there anything else that must happen before a person can be sent to prison 
Acceptable Responses Jury, trial judge, court 

Unacceptable Response! No response, legally onented response not tnenttomeg any oi the 
above acceptable responses, factor, retch a, Imgetpr.ntr or eyetvttnes. rehteh could be used as 
evidence at a trial 

Results state that a jury, trial, judge or court decides whether or not a person . gu ry 


Age 13 
81% 


Arc wc to conclude that the 19 percent who failed to respond correctly * ml 
support individual nghts and lrcedonrs> Or * no, recognise the mam 
tions of government bodies- 1 Or both 3 




78 


A STUD Y OF THE NA TIONAL ASSESSMENT 


5 What is the density of the wood block’ It ts grants per cubic centimeter (An 

answer between 38 and 64 was scored correct ) 

Age 13 

4 % Correct 

70 Incorrect 

2L No Response 

99 % 


An extreme case occurs m the Science exercise, Exercise 2 As Richard J 
Merrill has pointed out, 8 the exerase relates to several specific subobjectives 
ability to propose or select validating procedures — both logical and empin 
cal, ability to obtain requisite data, ability to interpret data, ability to reason 
quantitatively and symbolically, and facts and principles relating to density 
Which of these subobjectives has the 96 percent who did not respond correct 
ly failed to achieve 7 Respondents were asked to complete several exercises 
related to the example at hand, and these may provide some additional 
information — but they were not released 

To clarify the relationship of the exercises to the objectives, some effort 
might be devoted to arranging and analyzing the exercises in hierarchical 
form (see Figure 2) Exercises at the bottom of such a hierarchy or tree would 
measure single subobjectives, and those at the top would measure a composite 
of those at the bottom Development of hierarchies might call attention to the 
difficulty involved m determining with any precision the subobjectives a re 
spondent must master m order to perform correctly on a particular exercise 9 


FIGURE 2 
Exercise D 

T 


(Subobjectives 1, 2, and 3) 


Exercise A 
(Subobjective 1) 


Exercise B 
(Subobjective 2) 


Exercise C 

(Subobjective 3) 


e„ ‘ Nat ‘°' ,al Assessment of Science Education — A Beginning," in R,fat 1— 

. savalims and Commertaiy of a Panel of Revmvers (Den ver ECS, 1970} 

. °f c,tam pl e * m the hierarchy in Figure 2, it might turn out that not all 
nerrto-'n "j ° P 01 * 01711 corrcct ly on exercises A, B, and C also perform correctly on 
t 1-1 n„ n ,i! Cal '^ that SOme subob J ectlv « are involved in D other than subobjec- 
. « , C ° l Cr band “me respondents might perform correctly on exercise D 

, P . r0per y answenn 5 A-C, indicaung that mastery of subobjectives 1-3 is not 
tives' ° r CXCrC1SC ® Analysis of this sort might improve both exercises and objec* 



THE ASSESSMENT S EXERCISES 


79 


Exercises Which Fad to Measure Particular Objectives 
or Subobjectives 

Some exercises fail to indicate much about achievement of their related 
subobjectues For example consider Exercise 3 a Science exercise Presum 
ably this exercise is related to the subobjective (IIG) — Ability to reason quan 
titatively and symbolically — or perhaps (HE) — Ability to interpret data The 
relationship between performance on the exercise and achievement of either 
of these subobjectives however is tenuous at best Does selection of the cor 
rect response indicate an ability to reason quantitatively to interpret evi 
dence or more likely simply to read and comprehend straightforward Eng 
lish? 

Exercise 3 
(9 3-2) 

B g leaves usually g.vt off more waccr ih.n 1 nit Itavts Wh ch of tht fellow ng loaves gwo oil 
the most water 9 



2 


□ 



4 □ 



80 


A STUDY OF THE NATIONAL ASSESSMENT 



3 D I don’t know 

0 No response 

ToK 


Similarly, analysis of the Reading exercises indicates that although the 
objectives and subobjectives are relatively refined, the exercises themselves 
are not of adequate quality to test for the objectives Some Reading exercises 
employ written passages inappropriate for the objectives to be assessed For 
example, fact finding questions, which would imply the use of literal and 
factual material, are sometimes affixed to “flight of fantasy” passages, which 
might be better used for assessing more literary skills Exercise 4, a paragraph 
taken out of context from How Many Miles to Babylon is used as a basis for 
questions involving the skill, “remembering significant parts of what is 
read remembering pertinent details ” Nothing within the excerpt itself indi 
cates which information will turn out to be significant and pertinent Nor is 
there any sign that this selection might be taken from a mystery, which might 
alert the reader to seemingly inconsequential cues Because there is no inkling 
that any nugget of information in the paragraph is either more or less ger 
mane than any other, the selection is not ideally suited for winnowing out 
and remembering important information Although the questions are rela 
tively easy, the skill assessment is confounded by the nature of the exercise 

(13 8-12 A .2 15 7-5-10) 


Read the itory carefully to that you can answer the questions on the next page without looking 
back at the story v 6 

JV"** m °™? g 3nd J*'™* Dougk* awoke frightened Perhaps it was because the light had 
no turned on and the morning aty light itself was gray and cold hardly different from 
ea y evening aybe it was because of the three old women one bending over the sink one 
*5! ,rm ,hC . WaI1 °PP° S1,C h “ bed one sitting at the table her head bent over an empty 
ay u was because he had been thinking about how to run away from school when he 
went to bed the night before Maybe it was because it was a cold November Monday in Brook 
lyn He closed his eyes and pretended to sleep 



THE ASSESSMENT’S EXERCISES 


81 


Answer the following questions without referring to the story 
A In what city does the story take place'* 

B In what month does the story take place'* 

C On what day does the story take place’ 


Other Acceptable Responses 

A 1 New York 

2 New York City 


Exercise 5 is another example o( the choice of a poor paragraph with 
which to demonstrate a skill, in this instance, finding the main point or topic 
sentence of a passage The main idea is not embedded in any one particular 
sentence, nor is the first sentence the topic sentence The paragraph is tncom 
plete when taken out of context, presumably the sentences covering the defi- 
nition of "beat" are a way station to the discussion of the paragrap s ° 
mg subject, the beat generation The beat generation is thus the overa 
plied topic 


Exercue 5 
(131-9, 176-U) 


lead the paragraph and answer the question, which follow .. 

■rough the last war, or at lea “ “j ™ ',, wu John Kcrouac, the author ot. tine, 

nitorm, general quality which demand, an «dj<*u.e '• J ^ u ,, „„ yean ago, 

eglected novel “The Town and die City, ” h ° ™p„het.c eye. and one day he said 
hen the face was harder to recognize, but heh rp, ? , , obscure but the 

Von know, this really a « generahon Theongi- „ „„„ die leelrng ol 

waning is only too clear to most Americans More ££ mnd . ,„d, olt.ma.ely, ol art 

,^t”3r^a^LT»d"C:r^--e..eo„nnu.,b 


from early youth 

A What is the MAIN point ol the paragraph’ 


□ The beat generation 

p The labeling ol. past gener.uon 

□ Hie definition ol the won! “beat 

□ I don’t know 

Where would you MOST hkely hnd this paragraph 

P in the encyclopedia 



82 A STUD Y OF THE NA TIONAL ASSESSMENT 


□ In a collection of essays 

0 On a sports page 

1 | In the Dictionary of American Slang 
H I don’t know 

C What does the writer suggest when he mentions a “fine, neglected novel 7 
f*l Kerouac had the nght idea about the war 

D Kerouac had a clear understanding of the new post war generation 
n Kerouac had not received the recognition of “The Town and the City ’ that was 
deserved 
l~l I don’t know 

D According to the paragraph, the origins ol the word ' beat ’ are 
[~1 obscure 
Q clear to Americans 
f~) attributed to Kerouac 
0 attributed to jazz musicians 
0 I don’t know 

The objective of Exercise 6 is to “interpret the sentence structures and 
intonation patterns which signal both meanings and attitudes ” The Reading 
Objectives specify that this subobjective requires the student to have some 
comprehension of context Inquiring, “Which of the following asks a ques 
tion-*” calls for not an interpretation of meaning implied in context but a 
knowledge of grammar applied in isolation 

Exercise 6 
(1312-3, 175-1) 


Which of the following asks a question 7 

0 Already the has answer given been 
0 Been the answer already given has 
0 Has the answer already been given 
0 Has the given already answer been 
0 The answer has been given already 
0 I don’t know 

Exercise 7 is meant to demonstrate ability to “verify the facts through 
experience ” One might expect a relatively direct passage to accompany this 
higher level objective Instead, there follows a flimsy and labored excerpt 
about Amos the Ant Performance on such an item tells very little about a 
person’s ability to venfy fact through experience 



THE ASSESSMENT’S EXERCISES 


83 


(97-14) 


Read the story and answer the question which follows it 
One day Amos the Ant took his lunch to the park He sat down under a tree and started to eat 
Then some children came over Amos gave them each a sandwich It was a fine day for a picnic 

How do you know this story is make believe- 1 
H Ants don’t eat 
f~l Ants aren’t in parks 
FI Ants eat lunch at home 
f~] Ants don’t give people food 
["I Children are afraid of ants 


NAEP publications repeatedly note that the Reading exercises were devel 
oped to be “meaningful” and “relevant” and were selected for importance 
and quality,” 10 but there is considerable disparity between such descriptions 
of the exercises and the exercises themselves Beyond the problems a rea y 
cited, there is the additional fact that some of the material is, in ia ^ c 
Silberman’s term, “mindless ” Consider, for example, the Sl Y ! * p 
episodes (Exercises 8 and 9), Amos the Ant (Exercise 10), ig Y es 
(Exercise 11), or Johnny (Exercise 12) All of these sections - 
content and erratically and poorly written Amos the An, ts unworthy 0 / 


Exercise 8 

( 94 - 9 ) 


Read the story and answer the question which follows it ^ ^ 

Of all the things to eat in the world Silky the Spider especially for a 

a whole riverful of bean soup The trouble was that an ooened the door if he had any 

spider, so Silky used to knock on door after door asking w m< . n Sl lky found out 

bean soup for sale One day, a woman named Mrs Bean pe ^ stoma ch after that' 
her name, he gulped her right down Boy, did he have a pa 

How did Silky feel about bean soup’ 


O Silky hated bean soup 

□ Silky couldn’t stand bean soup 

□ Silky never thought about bean soup 

□ Silky thought bean soup was delicious 
Q I don t know 


’See General Information 


Yearbook (02 GIY) (Denver 


ECS, 1972) 



84 


A STUDY OF THE NATIONAL ASSESSMENT 


Exercise 9 
(96-13) 


Read the story and answer the question which follows it. 

The thing that Silky the Spider hated most was ram. Whenever it rained, his mother would 
take a big bar of soap and scrub him all over Then the ram would wash all the dirt and soap 
away That didn’t make Silky happy because he really liked dirt. 

Which one of these sentences tells BEST how Silky felt about rain* 
f~) Silky liked the rain 
n Silky thought ram was fun 
□ Silky was happy when it rained 
j 1 Silky didn’t like the ram at all. 

H I don’t know 


Exercise 10 
(93-1) 


Read the story and answer the question which follows it. 

One day Amos the Ant took his lunch to the park. He sat under a tree and started to eat. Then 
some children came over Amos gave them some food. It was a fine day for a picnic. 

What did Amos do FIRST in the story’ 

□ He had a picnic. 

Q He ate his lunch. 
fl He climbed a tree. 

□ He went to the park. 

I~1 He found some children. 
f~l I don’t know 

Exercise 11 
(9-6-9) 


Read the story about a fish and answer the question which follows it. 

Once there was a fish named Big Eyes who was tired of swimming He warned to get out of the 
water and walk like other animals do, so one day without telling anyone he just jumped out of 
the water, put on his shoes, and took a long walk around the park. 

What do you think the person who wrote this story was trying to do’ 

□ Tell you what fish are like 

I 1 Tell you that fish wear shoes 
0 Tell you a funny story about a fish 

□ Tell you that fish don t like to swtrn 
fl I don t know 



THE ASSESSMENT’S EXERCISES 


85 


Exercise 12 
(137-12) 


Read the passage and answer the question which follows it 
Johnny told Billy that he could make it rain any time he wanted to by stepping on a spider 
Billy said he couldn’t Johnny stepped on a spider That night it rained The next day Johnny 
told Billy, "That proves I can make it rain any time I want to ” 

Was Johnny right* 


□ Yes 

□ No 

n Can’t tell from the passage 
1~1 I don’t know 


associated objective. Follow development of an author s idea, since no g 
approaching an idea is developed The reading skills and un erstan 
American young people, even at the mne-year-old level, should not be a - 
d by responses to inferior material of this sort 


The Knowing versus Doing Distinction 

Knowledge of fundamental facts and principles can be 1 “ ‘ 

her of ways but the way tn which this is done may *1 he £ 

is used m practice For instance, ,f a respondent h “ ropr , 

or principle being tested, he may no, be able .3, 

ate empirical circumstances Constder .be , B Ca „ „e 

which falls under objective I, ^ Uy WO uld be able to apply 

conclude that the 61 percent who ans\ Jlfe sl tuation (eg, 10 

the complete circuit principle being teste battery, wire, and light 

repair a faulty flashlight or eleclnetram,w Although a 

are uo, as clearly “.“t may never m fact be able or 

student may respond correctly 

motivated to employ the knowle ge m <1 , he extent to which 

I, may demand a more to* 1°'“^ a. hand, for example, the 
factual knowledge is used m pra of .terns which can be used to 

respondent m.gh, be itches, etc , He could then be 

establish electric circuits tems Tbc c h. Id s sophistication m fund - 

asked to build something wi determined by the comp 

mental principles he works (Of course, some students 

SCrrtta^ rather thanacreutt) 



86 


A STUDY OF THE NA TIONAL ASSESSMENT 


Exercise 13 
(9-6-15) 



Jane wrapped the end of a piece of wire around the base of a flashlight bulb When she touched 
the bottom of the bulb to the center of the top of a new battery, the bulb did not light. What 
should Jane do next to light the bulb* 

Touch the end of the wire to the bulb 
Put the end of the wire in a drop of water 
Touch the bulb to the bottom of the battery 
Touch the end of the wire to the bottom of the battery 
1 don’t know 
No response 

Examples such as Exercise 13 are widespread The measurement of doing 
as distinct from knowing appears to be a problem NAEP has not yet been 
able to solve A systematic research effort is needed to develop test instru- 
ments which assay performance, although such effort might best be earned 
out by some group other than NAEP The results of such an effort would 
have applications for the Assessment, but also for the testing field generally 

Process versus Factual Knowledge 

While some objectives concern factual knowledge, others concern the skills 
and abtlmes needed to engage in the process of a subject-matter area The 
assessment of process in Science is often quite inventive For example. Exer- 
cise 14 provides an excellent vehicle for the respondent to demonstrafe 
whether he can think about problems m a scientific way 


Age 9 
13% 

4 

11 

61 

10 




t HE ASSESSMENT’S EXERCISES 


87 


Exercise 14 
(9-6-9) 


wa^er uffl— “ !t 3nd * u * ar wldl witer and Iet mixture stand you get 
thu >dea? ffy k,nd of cand y Which of the following would be the best way for y ou to tel 


Age 9 
3% 
13 
7 
6 
66 
5 

— SL 

100 % 


Take a vote among your friends 

Buy some salt water taffy and see if it has salt in it 

Find out if salt and sugar have the same chemicals in them 

Grind up some salt water taffy to see if you get salt, sugar, and water 

Try to mis salt, sugar, and water, let them stand, and see what happens. 

I don't know 

No response 


Exrrnsr 15 
(92-10) 


Why do very few people get smallpox in the United States today’ 


Age 9 
12 % 
50 
6 
4 
17 
11 

-0_ 

100%, 


The weather conditions have changed 
Most people get smallpox vaccinations 
People move more often than they used to 
People drink more milk today than ever before 
All the germs that cause smallpox have been killed 
I don't know 
No response 


Some exercises unfortunately Sail into the trap of really measuring fact, 
while purporting to measure process For example, in Exercise 15, the proper 
response requires knowledge of facts— since weather, milk, or eradication of 
germs, although unlikely, are possible causes for reduction of smallpox The 
alternative hypotheses must be tested by evidence not at hand in the exercise 


Measurement of Attitudes 

The measurement of attitudes, like other psychological measurement, usu- 
ally involves multiple test items, which bear complex interrelations Atutude 
measures are commonly churned by summing .he result, on sevend w< 
items, and lie-scales and measures of interna! consistency often are hu.Il m 



88 


A STUDY OF THE NA TIONAL ASSESSMENT 


The NAEP decision to create exercises each of which can stand alone inde 
pendent of the others forced a departure from these conventional attitude 
measurement techniques 

Unfortunately, the individual attitude exercises developed in this way are 
often trivial For example, to what extent does Exercise 16 really measure the 
desired objective. Have attitudes about and appreciations of scientists, sci 
ence, and the consequences of science 7 First, it is difficult to interpret the 
response choices (often, sometimes, never) because no basis for comparison is 
provided The exercise would be more useful were it part of a set of similar 
items concerning several types of television specials — political documentaries, 
dramatic productions, concerts, and so on Furthermore, should not a careful 
thinking respondent be selective in choosing TV science programs 7 Would a 
seventeen year old with positive attitudes toward science choose a scientific 
program over a sports program scheduled simultaneously 7 Also, might not a 
respondent prefer doing science to watching it on television 7 Problems of 
interpretation render this exercise of little value 

Exercise 16 

(1711 10a) 


learn about a special television program dealing with a scientific topic do you watch 

Often 

Sometimes 

Never 

I don t know 
No response 


(a) If you 
it’ 

Age 17 
17% 

64 

19 

0 

1 

101 % 


C ,tlZens ^ 1, P exerc ise. Exercise 17, provides some insight into another 
° • ficulty A respondent to this exercise must be influenced strongly by 

e lie\es is the desired or expected response But there is no way to 
’f' 55 the depth of convicuon or the behavioral consequences of belief An 
J c of such importance (Treat all persons with respect, do not condemn 
others on the basts of irrelevant personal or social characteristics) merits a 
more complex evaluation perhaps by means of an open ended interview 
ew attitude exercises begin to assess the degree to which respondents are 
a e to m c complex and subtle judgments and distinctions, for example 
erase Howescr, most altitude exercises suffer from poor construction 



THE ASSESSMENT’S EXERCISES 


Extrcue 17 

(1311-18, 174-10, A5-22) 


A4* • Objectives Treat all individuals with respect, do not condemn others on the basis of 
irrelevant personal or social characteristics (Do not avoid associating with others because of such 
differences Reliance on stereotypes in reacting to members of minorities shows little respect for 
them as individuals ) 

Ages 13 and 17 Tape-accompanied paper-and-pencil 


Adult Paper-and penal in individual interview 
People feel differently toward people of other races 
of a different race doing these things’ 


How willing would you be to have a person 


Age 


{For each situation below, the choices were 
willing to, prefer not to] 

A Be your dentist or doctor’ 

B Live neat door to you’ 

C Represent you in some elected office’ 

D Sit at a table next to yours in a crowded 
restaurant’ 

E Stay in the same hotel or motel as you’ 

Willing to associate with a person of a different 
race in 3 or more of the above situations 

4 or more 


J3 J7 Adujt 
80% 70% 74% 

82 71 67 

80 77 82 

82 83 88 

88 85 89 

90 89 87 

77 79 77 

57 57 57 


all 5 


•Not administered to the in school sample in one 
ty and one Southwestern city at the request ol sta e 


, arg e Western state, one Southeastern coun 

: or local authorities 


A2- - Objectives Are loyal » ^ ” lh 

nd actions ) 


actions ) 

gee 9 and 13 mend. «s P« mj.d "»!n MW M 

Suppose the father of one of yoor k „e,v that the father was ,» jail (>=■ 

friend to come to ymr fjr)jcr man • to pM ”> 

(At Age 13, "m * „ „„ aslcd to mire that eh,ldre» 

(„ -No” to A) Why no. (» , eh „„ „ p ,„„ their position ) 

understood Part A and to g> 
the Child didn t stem 



90 


A STUDY OF THE NATIONAL ASSESSMENT 


no reason not to now, don't take it out on the inend, you can’t just let a good fnend go, to 
cheer the fnend up, she would be lonesome without her father I would like to make the 
child happier, it doesn’t matter to me 

Unacceptable reasons to C** No response, I don’t know, or a few responses inconsistent 
with a * Yes” to A Child might steal like the father, fnend would be too worried or too 
embarrassed to come, I wouldn’t want the other child around, it might get me in trouble, 
I’d want him to stay home with his mother 

Results 

Age 
9 13 

Indicated willingness to associate with fnend whose father was in jail (Yes to A) 56% 79% 

Indicated willingness to associate and gave an acceptable reason for doing so 

(acceptable reason to C as well as Yes to A) 48 76 

♦Not administered at Age 9 m one Southwestern city at the request of local authonties 
••Descriptions of acceptable and unacceptable responses for all exercises are summanes of the 
responses actually made, but are not verbatim, of course, since each type describes a vanety of 
wordings from a large number of respondents 


insufficient relation to the objectives, and an undue likelihood that the re- 
spondent will be influenced by his perception of the desired response NAEP 
might have experimented with alternate formats, such as the semantic differ- 
ential and the Q sort 


Innovative Exercises 

While most NAEP exercises are traditional multiple choice, some are quite 
unusual For example, m a number oi Science exercises the respondents were 
given items of equipment (a balance and weights, a block of wood and a 
ruler, a pendulum and a clock, etc ) and asked to complete specified tasks 
Although in certain cases (notably Exercise 2, the density exercise discussed 
earlier) the experiments to be performed overlap a number of objectives at 
once, tn general these exercises begin to tap the ability of respondents to 
behave in a scientific way The Science exercises in which students perform 
experiments are among NAEP’s greatest achievements in testing 

One of the most interesting and unconventional exercises is a Citizenship 
example, Exercise 19 The exercise provides some information on the extent 
to which participants m small group decision-making situations take clear 
positions, defend their points of view, and reach compromise 



THE ASSESSMENT’S EXERCISES 


91 


Exercise 19 
(1310-1, 1712-1) 


E14 (Group Task) Objectives Apply democratic procedures on a practical level when working 
in a group Try to inform themselves on socially important matters and to understand alternative 
viewpoints Weigh alternatives and consequences carefully, then make decisions and carry them 
out without undue delay Have good ideas for solutions. Support free communication and com 
mumcate honestly with others (willingly express their own views on civic and social matters, 
however controversial the issue, encourage the bearing of dissenting viewpoints) De en ng ts 
and liberties of all kinds of people uniformly (defend the right of a person to express his opin- 
ions) 


Ages 13 and 17. Observed group interaction 

Exercise Description The purpose of this exercise was to provide a standard 
to measure effective cooperation in a group task A group of eig t stu en " a „d 

from a list of twelve issues (see lists below) the five at'fr^tthe two 

adults, to rank them in order of importance, and to hid thlrty minutes to 

most important problems, and for all five problems i ey anything they 

complete the task The only rule was that a majority of the group must agree on anytn g 

wrote 

, ,K ev discussed the issues, each 

Two observers recorded individual acts of group mem observers participate in the 

observer recording different types of behavior At no time . d ual actions they were 

dromon Ota™, were specully «™d .» reeosnmms ' te ■” d ” a 
responsible for recording ^ 

The task seemed appropnatc and interesting f™^ r ation otganoatton, contribution of 

enthnuasm The type, of behavror , ”^„e behavioral ca.og.ne. is loted 

substance and defending a viewpoint The complete set P- 
in the results section on the next page 


List of Issues 
Age 13 

Time Limits (for being home, in 
bed, etc.) 

Home Duties 
School Assignments 
Adult Books and Movies 
Sports and Other Activities 
Dating and Parties 
Parents’ Approval of Fnen 

Money (where from and how spent) 

Dress and Appearance 

Smoking 

Swearing uke w Adu i, 


Age 17 

Censorship 

Curfew 

Voting Age 

Drinking 

Smoking 

Working Rules and Laws 
Marriage 
Auto Insurance 
Dress and Appearance 
Military Service 
School Attendance 
Civil Liability 
Criminal Liability 



92 


A STUDY OF THE NA TIONAL ASSESSMENT 


Results (Positive Actions) 

% Who Did Thu 
at Least Once 
Age 

Toole a dear position 67% 

Oave a reason for a point of view 67 7g 

Sough, information rola.od to the game from other loam member. or from 
tne administrator ^ 

Stood dio taol by organmng dio group or by ouggoming a chango in ptoce- 

Dtoidod tho nght of anolficr group mombor to bo hoard or ,o hold a differ ’’ 
ent opinion ^ 

Defended own viewpoint contrary to a previous consensus* 6 24 

who never a, ^ ^ P< ” 1 ' 1Ve actlons » one negative action was recorded The percentage of students 
who jumt acted in such a manner throughout a session is reported here 

Avoidance of Negative Actions 


N m™otr ~ — ” - - — do ‘ 3 

93% 

Group Effecuveness 


% Who Never 
Did Thu 
Age 

17 


% of Groupi 
Performing Action 
Age 

Soloctod five mo., important imre, £ % 

z? — - *• - - - — 

U.k)' “ d rccomn, endations for all of thorn (tomplolod 

58 23 

Bine nhhe groum'dld ^ effccUvc S^P interaction to otabluh any formal procedure., 
■even, eon. 30 percent * IccJd, j'*""™ A ' *" 23 P'™* 1 - " d *' 

provioudy taken at l'Z ^Toto " ''‘tSTTT’ “ "6“' agaimt a pootion 

at least initially At f lmn , . °* the group He had to “stand alone” in this act, 

accept «™r" L 1 T — able to convince ,he gmnp to 

poto, a, a"~ “,^ h : “»• » I—' •' "3^ dmtoen and ,7 

mg a contrary view oapl.cdy yielded eiih"”^ E "’“ P “ h '' oI * h “' drf " d 

y y* , either being convinced of the majority view themselves or 



THE ASSESSMENT S EXERCISES 


93 


yielding so that the group could continue the task Of those who in tially defended a contrary 
view 1 8 percent at age thirteen and 26 percent at age seventeen yielded in this way 


Dr Tobe Johnson has offered some telling criticisms of this and similar 
NAEP exercises 11 

[These exercises] are rather pallid simulations of the kind of effec 
live cooperation essential to getting complex and controversial 
things done cooperatively in this society and w ic wi 
creasingly, the real test of effective c.t.zenship and even the s 
vtval of this country as a nation state [One sue exe *^ 
sures cooperation only at the cognitive level— t a is 
toward a'tmple task where goals and va ues are 
The same is largely true of exercise (13 10 1 ^ 

19] which might have occurred among heteroge g P 
contrasted with more homogeneous ones 

It is certainly essential that exercises involving decision 
of conflicting goals and values be developed in it ’-J"" o Zunda"> exercises in 
has made a good beginning in creating a sum j the 

group process The fact that a.. Dr Johnson the 

group exercises are presently pallid simu atlon! abilities albeit not 

reality that they do tnnova.tvely measure some tmportan. abilities 

all or even the most important 


TH p NAEP exercises and 

CRITERION REFERENCED MEASUREMENT 
Cntenon Referencing Some Theoretical Cous, derations ^ 

NAEP s attempt to create ""‘^''“'^“'duuiguith.ng feature °' jj' 
ject matter objectives is not on y difficult challenges it has faced n 
Assessment but ts also cue of the me* pro blem of relating 

las, decade a growing l*»»" of testing-ofe" called cn 
tests to instructional objectives d This relatively new P 

tenon referenced measurement -1. 


a Critical Response 
ECS 1970) P 21 


. National Assessment R't^ 2 


(Denver 


94 


A STUDY OF THE NATIONAL ASSESSMENT 


enon has generated constdcrablc excitement and an original set of questions 
about testmg Any judgment „f thc success of the exerenes developed b, 
mUSt m,onned b y a cm, cal understanding of criterion referenc 

The current excitement surrounding criterion referenced measurement 
among educational measurement specialists was initiated by Robert Glaser m 
a paper published 1963 » There, Glaser contrast criterion referenced 
measurement (CRM) and the traditional norm referenced measurement 

At the most elementary level, thc difference between CRM and NRM lies 
in the standard used as a reference in judging respondent performance CRM 
depends on an absolute standard of performance as measured against mas 
cry o specific knowledge and sh|k _ whl|< . NRM dcpen(Js on a re]atiie ^ 

dard, as measured against die performance of other students Glaser writes 

tf r °° ncep , t , of achle% 'cment measurement is the no- 
orof,aencv a, T. Um know,ed S' ^qum.ion ranging from no 
ment leve^fall ° P* 1 *"* Performance An individual s achieve 
m whiSi hw Li SOme P° mt on thls “Utmuum The degree 
specified 1 ev c I C levcmen * resembles desired performance at any 
a&memorproS * “referenced measures i 
in this wav roiIclenc y The term “criterion/ when used 
■or Crln^T "T' 1 ' 1 ' rcfcr ,mal ™d-of-course behav 
where it is C S ^ estab * lsbe d at any point in inslrucuon 
an '° ° btam m,orm ation as to the adequacy of 

mrs^oh^ ‘ 5 Th ' P°‘ nt “ that »>= «P«fic behas 

to describe thV^ 7' ° f P roficIcnc y can be idenufied and used 
forn^Tb' n PeC, P C tasta a tuden, must be capable of per 
this seme thm aCh,ev f °"' of 1,1 “C knowledge levels It ism 
at measures of proficiency can be entenon referenced 

non about°th^ r band ’ achievement measures also comey informa 
r d en.sl h ,r, fP ab ' 1,ty ° f otbcr students In tnstances where a 
tlalrn standing along the continuum of attainment is 

Assessment exercises h canons often use the term mltne, rr/rrmerrf to desenbe the 
mom appm p ™ v / a “/~" Uy ^has deoded that <-ob J ecos~re/em„e«d ts 
on referencing are central to the T" ^ ™ 5 ' d by lhc debate on cnlen 

15 Robert Ci,„ M . i ^ 3sessJT,cnt 5 attempt to relate exerases to objectives. 
Learning Outcomes SomrO, nsUllc ^ onaI Technology and the Measurement of 
tnced AUasurment (Englcwoo/chff!’ “* W Jam “ Po P ham CnUrvm Rrjer 

1971) pp 5-16 Educational Technology Publications 



THE ASSESSMENT S EXERCISES 


95 


the primary purpose of measurement reference need not be made 
to criterion behavior Educational achievement examinations for 
example are administered frequently for the purpose of ordering 
students m a class or school rather than for assessing their attain 
ment of specified curriculum objectives When such norm refer 
enced measures are used a particular students achievement is 
evaluated m terms of a comparison between his performance and 
the performance of other members of the group 


A second disunion between CRM and NRM concerns the characteristic 
ways the scores on the two types of tests d.v.de groups of rK P° nd “ ,s 
criterion referenced lest should be designed to divide students in o ju 
groups-, hose who have mastered the material and those who ha, no A 
form referenced tes, on .he o.her hand should be designed 
and therefore should be constructed to maximize the vana V 
even among students all of whom have mastered the 
interesting to note that in the great number of heroic attempt ° " 
tiona! production functions .e funclions relating Per 

puts the output measures used have invana y max , ro ,ze vanance 

haps it is partly because such tests arc exp icn y sKniat , c results have 

within rather than among treatment group 

been found ) . „ constructed 5 Tradi 

How can we tell if a measurement devt loped for 

tional educational measurement emp asiz Re i, a bility is essentially a 
norm referenced tests reliability and validity „ rcsu lu each 

measure of repeatability A reliable .«« should pmdu^h ^ m 

time it is admin, slered as long as .he respondent rem 
relevant respects t measu res the behavior it 

Validity is an indicator of whether , and construct valid, 

sets on, ."measure Theorists f^Aplde Tes, (SAT, might he 
ty For example .he validity. of the, P ' AT moms can he used to 

f^judged 1 ofthe bmus °Hh ede f’ rec wdllc *’ “ corrclates w i'h olher me 

anxfety (construct validity) 


Ibid PP 



96 


A STUDY OF THE NA T10NAL ASSESSMENT 


n vieued from the vantage points of reliability and validity, criterion 
referencing begins to lose some of the “charisma” „ enjoys at first consider 
” ®" eral “"tradictions become evident, and these have not yet been 
resolved by measurement theorists For example, the usual statistical esumate 
reliability cannot always be applied to CRM tests (due to small within 
p variances), and when the estimate can be computed, it often is difficult 
interpret What is required for CRM is an index of reliability that 
indicates the power of a test to discnm.nate reliably between pre and post 
reatmen, groups, rather than the usual index designed for norm referenced 
* h, ] Ch ' 1 " d,cat « ‘he Power of a test to discriminate reliably among 
it would be like 11 “ mdCX haS " 0t yC ‘ beC " d ' v,:l °P ed - "° r » clear " hat 

Pr0 c lem “ ncerns thc consistency of test items Theorists hate 
to which a nU ™ ber i ° f tools wh,ch “n ^ used in NRM to analyze the extent 
test O afl f ‘1 ,V1 Ua * ltem m a test contnbutes to the overall power of the 
percent^ ° / .if** ^ thC dlscnmmatI0n mdex, which is a measure of the 
Z Z2 ' f a P? r,ormi "S highly on the overall test correctly answering 

on the n 7 COm ' erat *on less the percentage of those performing poorly 
on l lrf 7' W , b0 — ’be item An „cm should be reZed 

item and° ^ 'i. ^ P° SIUvc ^ discriminating” (that is, if performance on the 

■tern and on the test as a whole are positively relaled) 

,ud7e C ,h I™" 3 ,' 1011 ‘ ndM ' h0WCV ' r ’ ma >’ *" approprrale way to 

defined """ “ 3 CRM *« *• C Cox and J S Vargas hare 

ZZ L P .r“" T tat d,fferenCe ,nd «’ wh ' ch “ 1 percentage of those 
passina bef ' Un ^ cons,deralIon following treatment less the percentage 
SlTZetr T tmem In,eresIm Sly. in a study of two anthTellc tesfs, 

were onW n°^r n "* ^ Var S“ ‘" d " a " d 'he d.scnm.nal.on index 
only 0 37 for one lest and 0 40 for the olher Apparently, .terns which 


uonal Research ^ l HuSclc and K. Sirolmk, ‘ Item Sampling in Educa 

Research Association ChZ^ '“ d •<> Amencan Ednealional 

the items on a test tv, 7 " generally estimated as the PM correlation coefficient of 
test scores obtained bv si K *“**7“* “^“tency) or as the correlation coefficient of 
taking a single lest wh 'Z? ' h ' “™ ° r ” ra,lar twice (or by subjects 

pines) E “ h,Ch “ »P'« halve, for scones and pur 

mques for Norm R^f’e'ren^d'ind cZ“ “v , Com P amon ° f ,Mm Selection Tech 
uonal Council on Measuremr ™ cnon Relerenced Tests ■ paper read lo Ihe Na 
ham op cic ,n Rdueanon Chicago February 1966 cued in Pop 



THE ASSESSMENT S EXERCISES 


97 


receive high marks on one index receive low marks on another Thus de 
pending on the purpose of the test (discrimination between individuals or 
discrimination between treatments) different item analysis techniques may 
be m order 

Validity is a more difficult issue than reliability Most proponents of CRM 
argue that content validity (rather than predictive or construct validity) is 
most important for criterion referenced measures test items should be repre 
sentative of the actual proficiencies being assessed To date criterion re er 
enced measurement has been used most successfully in basic skills training 
such as programmed arithmetic instruction where content %a 1 ity 
tively easy to define For example the enter, on all students should be ah 
to add columns of ten three dig, t numbers with 80 percent acc “ J 
easily measured by constructing a test chocs, ng essent.a^nu.dcm arnples ol 
such long add, non problems CRM has aho been used ,0 eva'ea.e 
reading skills (sec especially Rodney Skager s wor at , c 

In discussing performance criteria Allred Garvin of the University 
cmnati has proposed a radical thesis 

Our primary concern is with ^“TmeMinelu 11 "™'"'' 13 *° lh “ c 
tlonal objectives The of and the 

instructional objectives dictates b P instructional ob 

necessity lor CRM The relevance ^ ^ 

jectives is inherent in the center, ( ^ insm ,ct.on we 

ar^To^e^CR’hfandNRM. 

Garvin asserts that performance criteria can be thls pnneiple 

,1 some extra classroom proficiency is required He app 
to reach several important conclusions 

, unless a. leas, one 

vant because there u w cr “'™” nslb ,I,ty or other ethical con 

s, derations demand^ sE". «— ' a * ° ^ 1 ' ” 


. and Level » “ P 



98 


A STUDY OF THE NATIONAL ASSESSMENT 


“qualified" for them by formal instruction, then CRM of the out- 
comes of such instruction is clearly indicated The cntenon here is 
the licensing standards of the profession involved However, 
entry to such professional trains is typically based on NRM since 
training capacity imposes a "quota ” 

3 In any instructional sequence where the content is inherently 
cumulative and the rigor progressively greater, CRM should be 
used to control entry to successive units However, if there arc 
several different sequences, differing widely m rigor, NRM is 
m ^ re JI Sefu ln makln S appropriate placements 

A I here are certain content areas to which criteria do apply but 
not everyone need meet them There are the “required subjects”, 
everyone must try to learn them-if only as a matter of public 
policy but it is almost preordained that some of them will not » 

If any conclusion ,s supported by the research on criterion referencing, ,t is 
, * 1 * , C ^ C arC 35 ^ Ct no stnctI y logical or statistical procedures for assessing 
the validity or even the sense of a criterion referenced measure Trying to 
ea e w ict ler a test is in fact truly a measure of some cntenon behavior is 
ut 1 -c trying to decide whether a machine exhibits artificial intelli 
g ce a completely rigorous decision procedure is not possible In the field 
of information science, the notion of a Turing Test has been proposed a 

t 11 Tu S 'i! t0 PaSS thC Tunn S Test °I intelligence if an observer cannot 
tell whether he is communicating with a person or with the machine 

, ,'/ ir a unn S Test of cntenon referenced measurement might be 
tel l! 1S Val,d lf an observer «, after looking at a respondent’s 

STT make 3 “^judgment about the respondent’s capabilities with 
L , r| ° a cr ' ,cnon This certainly seems to be a weak tool for test analysis, 
bu, Glaser, who started all, has reached a similar conclusion 

n0rm rrf ' ren ^d and criterion refer 
the informatio ^tf” determined by examining the specificity of 
dommoXT f h? 0bta ' n ' d b r »>>= teat in relation to the 
domain and 1 , 'T' faigiral transition from the test to the 

cZnhshed fn asain fr ° m ,he domiun s bould be readily ac- 
S m fdem,r r n ° n [ cfcrcnc ' d '<*“• to that there ,s little dif- 
tasks that can SOm<: degree of confidence the class of 

Tetimd b^rr,^ P' rfo 'T'd This means tha, the task domain 
"b^aWe teha " r ' f r n “ d "lust be defined terms of 
”he Sonn^n e T “ 3 ^PresentaUve sample of 

performance domain from which commence is inferred ” 


M Ibtd , pp 62-63 

11 Robert Glaser "A Cruenon Referenced Ten,” 


i Popham, op at, p 44 



THE ASSESSMENT’S EXERCISES 


99 


Criterion Referencing and the NAEP Exercises 
vid^Lme* T! *? thC debatC ° n cntenon referenced measurement pro- 

nves Z7Z Wh,Ch NAEP ’ S attem P J 10 rektC — » objec 

then T bC ’i UdSCd WC VV,JI COm,der thc NAEP excises in two ways first, 
ugh analysis of the exercises as they actually were constructed in the 
«sments to date, and second, through analysis of the NAEP philosophy of 
criterion referencing 

Although the examination above indicated that CRM remains a cloudy 
CCrt3,n e *Pf ,c it features of the CRM topology are recognizable 
involves absolute standards of performance and its goal is to discrimi 
nate between groups with inadequate mastery (“pre treatment groups ’) and 
groups with criterion level mastery ( ‘post-treatment groups’ ) More precise 
y» CRM is intended to maximize variability of response between treatment 
groups, and minimize variability within groups 
A study of nine sets of NAEP subject matter objectives reveals that only 
two come close to stating performance criteria 


Music While each ability level of the four age groups would be 
presented tasks relative to each m<yo r objective it is not expected 
that this would be true for the subobjectives In reading music or 
m knowledge about it, for example, certain subgoals would be 
appropriate for only the top 10 percent and then would not neces 
sarjjy be accomplished by all persons in that group 

Science The delineations are, in general, written in terms of what 
approximately half of the people at a given age level might be 
expected to know or be able to do 32 


Neither of these, of course, is very far on the path toward explicit standards of 
performance Music sets only a vague 10 percent standard for some subgoals, 
while the science standard is somewhat more specific A more important 
problem, however, is that the catena are worded not m terms of the fraction 
Of the exercises a student must be able to perform correcdy in order to 
achieve each major objective but rather m terms of the fraction of the stu 
dents that should achieve each subgoal or major goaf The fact that only wo 
of seven subject matter areas even attempt to set standards and that the 
standards set are stated in an inverted form reminiscent of norm referenced 
testing, ts highly disturbing 

n, Anm.r.mt ha, net, frmJd ,pr«f« a/ A, Manor, ojahrh a a*l 


22 NAEP objectives booklets 



100 


A STUD Y OF THE NA TIONAL ASSESSMENT 


must be capable m order to meet each objective or subobject, ve Although the exercises 
arc loosely related to the objectives, there is no way ol knowing the level of 
performance on the exercises that would indicate achievement of the objec 
tives Thus, it is impossible to determine, for example, the number of seven 
teen year olds who have mastered (at the appropriate level) the facts and 
principles of science (Science objective I) To make this determination would 
require resolving difficult problems of reliability and validity as well as set 
ting performance criteria 

It may well be, as Alfred Garvin has suggested, that performance criteria 
in some subject matter areas cannot be established, since no specific extra 
classroom performance is required in these areas If this is the case, it is 
oois to speak of criterion referencing such subject area tests 
The Assessment has more closely approached CRM with respect to vana 
i ity t an with respect to standards The released exercises include many 10 
percent and 90 percent examples ,n addition to the usual 50 percent-thus it 
pected that within treatment groups, variability should be low Since the 
ercises ave not yet been used to measure treatments as such however, 
e is not empirical evidence concerning treatment variability (discrimma 


summary, the Assessment appears to have developed a modified testing 
’ somewhere between the norm referenced and criterion referenced ap 
proac es, relying on the development of 10 percent, 50 percent, and 90 per 
exercises But even this small step toward criterion referenced measure 
reduced in its significance by inadequacies involved in NAEP 
mpts to predict the number of exercises answerable by particular percent 
liL MAEn^ U ' a,IOn Analysis of the reading exercises indicates the prob- 
ems NAEP was unable to deal with in defining difficulty levels 


Working Definition of “Difficulty” 

, °. ne the pnme requisites for the development of exercises at various 
iculty, and fundamental to a claim of comparability among exer 
’ a concise working definition of * difficulty ” As used in the Assess 
T dCfmCd m the NAEP ! «*rature, and an examination of the 
cally applied ^ ^ COnc ^ us,on the term is loosely construed and errati 

Classification of Excrcscs NAEP has not provided a scale or cnlenon rc( 
crcncc aga.nsl which Ihc d, [family of material migh, be judged In Reading 
or example, the level ol difficulty of an exercise is probab!> a function of 



THE ASSESSMENT’S EXERCISES 


101 


conceptual complexity, density, vocabulary, and length, interacting with the 
reader’s experience and ability Little systematic weighting of these factors is 
apparent in the exercises, however, and ranking seems to have been governed 
solely by committee consensus, with no common guidelines 

This consensus method of ranking exercises for difficulty did not yield 
accurate predictions, many exercises do not reflect the anticipated difficulty 
levels Perhaps had there been some definition and agreement on the factois 
constituting difficulty, consensus of experts would have been more mean 
mgful Consider, for example, Exercise 20, which is rated medium for nine 
and thirteen-year-olds The objective here is to demonstrate ability 
function words as an aid to getting the meaning Since the picture shows only 
one object (the sign) anywhere about the door, the respondent need not even 
know the meaning of “sign" or “hanging- and is left with the task '.c i mcog 
nizing and deciding among “by,” “on,’ over, or near to 
location of the sign Since “on” is an easily recognized and 
stood word, it ,s no. clear what, .1 any, cmena were used to rate this exercise 

as medium instead of easy 



Look at the picture and fill 

" n A sign is hanging hy the door 

□ A sign i* hanging on the door 

□ A sign w hanging orer the door 

□ A sign is hanging near the door 

Q I don t know 


the oval beside the sentence 


; „h.=h tells BEST whs. .he d..-i"! 



102 


A STUDY OF THE NATIONAL ASSESSMENT 


Comparability It is not always possible to use examples of passages rated 
easy, medium, or hard, to rank other exercises Exercise 21, for example, is 
easier than Exercise 22 on any measure such as complexity of thought, vocab- 
ulary, or length, yet both passages are ranked as easy. In fact, the national 
response for nine-year-olds was 83.4 percent correct for Exercise 21 and 26 9 
percent for Exercise 22. Although the respective objectives involved — Follow 
the development of an author’s idea and determine the main idea — are more 
or less equivalent in complexity, application of the skill to Exercise 22 is more 
difficult due to the density of information. It is not apparent on what grounds 
the passages were estimated to be of comparable difficulty. 

Exercise 21 
(9€-5) 


Read the story and answer the question which follows it. 

The wind pushed the boat farther and farther out to sea. It started to rain and the fog grew 
thick. The boy and his father were lost at sea. 

What happened FIRST in the story’ 

□ It became foggy 
PI It started to rain 
0 The boat turned over 
[~| The boat went out to sea. 
n I don’t know 

Exercise 22 

(9 7-6, 132-14, 176-11, A6-7) 


Read the passage and answer the questions on the next page 

One spnng Fanner Brown had an unusually good field of wheat. Whenever he saw any birds 
in this field, he got his gun and shot as many of them as he could. In the middie of the summer 
he found that his wheat was being ruined by insects. With no birds to feed on them, the insects 
had multiplied very fast. What Farmer Brown did not understand was this A bird is not simply 
an animal that eats food the fanner may want for himself Instead, it is one of many links in the 
complex surroundings, or enemmmeni, m which we live 

How much grain a farmer can raise on an acre of ground depends on many factors. All of 
these factors can be divided into two big groups. Such things as the richness of the soil, the 
amount of rainfall, the amount of sunlight, and the temperature belong together in one of these 
groups. This group may be called non-Urtng factors The second group may be called hn*g factors 
The living factors m any plant's environment are animals and other plants. Wheat, for example, 
may be damaged by wheat rust, a tiny plant that feeds on wheat, or it may be eaten by 
plant-eating animals such as birds or grasshoppers. . . 

It is easy to see that the relations of plants and animals to their environment are very complex, 
and that any change in the environment u likely to bring about a whole senes of changes. 



THE ASSESSMENT'S EXERCISES 


103 


A What u the MAIN idea of this passage’ 


|~1 Farmers should not shoot any birds 
[ | Insects eat up all the fanner s crops 
[~~] No crops can be grown without sunlight 
f~~l Birds eat up most of the fanner s grain 
□ All living things are affected by living things 
11 I don’t know 

B The passage also points out the importance of which fact’ 


□ 

□ 

□ 

□ 

□ 

□ 


A bird IS Simply an animal that eats up grain 
Wheat rust is similar to the rust on your own bicycle 
Only living factors determine how much com can be raise 
How much wheat is grown depends only on how much is placed 
Any change in the environment is likely to cause o er c an 

I don t know 


Across Age Ddhculty Ranking 

Overlap exercises were designed to be given to ■ store i than were 
order to allow a comparison of abilities by ^ spec ,al prob 

administered to all four groups T ^ ovtr ] ap exercises are classi 

lems for the assignment of ' across several age groups, in contra 

fied as uniformly easy, mediu , jnt0 aCCO unt increasing so- 

diction to NAEP’s claim that the exer ^ ticu l a r overlap item 

phistication with age in the use o s i ]eve , Sj most would seem to be 

might remain easy, medium, or hard -£•»» ^ a „ d seventeen year 

of varying difficulty to nine year olds, th y difficulty and age. 

Is Ike For questions of die theories o, eogniuve 

NAEP should give some atten o 

growth and related testing efforts , nc0 „ s ,stency m the 

Artificial Elements "'““"''^ade ddhOTh" Some Reading 
manner in which exercises ate made d. ^ (actorJ such as the 

example, quite legitimately I-neth of sentences or paragrap 

l a Ja. ; s conceptual or the use * 

amount of ■"formation ™^" 'Lwever, exercises ate made < Y 

specialized vocabulary Sometim^, ^ ^ irrelc „„, to 

the inclusion of and only to distract the su j ^ ^j gatc a 

objective being asse presenting a particular P ass *^ ^re the 

signed task » ■>- cross purpose by attempuug obsro 

specified objective, why 



104 


A STUDY OF THE NA TIONAL ASSESSMENT 


Excrcisr 23 
(97-16) 


Read the passage and answer the question which follows it. 

Colorado has many mountains Colorado has more than 1 000 peaks two miles high. Gold was 
discovered in Colorado in 1859 A total of M of the 69 highest mountains in the United States 
arc in Colorado 

Which words teD what this passage is MAINLY about’ 
f~l Fish W Colorado 
|~~1 Hunting in Colorado 
1 | Mining in Colorado 
f~I Mountains in Colorado 
PI I don t know 

correct answer or the information necessary to reach a correct conclusion? 
For example, the objective of Exercise 23 is Finding the main point of a 
paragraph The paragraph presented in the exercise is primarily about 
mountains in Colorado, but a rum sequtlur distractor sentence has been inserted 
in the middle, creating a poorly written paragraph. There is no reason to 
select a hodgepodge passage to assess this objective, when there are well 
written paragraphs, containing distinct mam and subordinate ideas If an 
objective requires difficult or challenging material, the related exercises 
should be constructed through the selection of innately complex material 
rather than through adulteration of simple passages Alternatively, the exer 
cise at hand could have been improved by including answer choices relevant 
to the paragraph but still distinguishing between matching and getting the 
mam idea. For example, the choices might have read peaks two mites 
high, (6) 54 of the 69, ( c ) mountains in the United States, and (d) mountains 
in Colorado 


The Assessment’s View of Criterion Referencing 

In spite of these many difficulties, it is clear that the Assessment staff 
considered the creation of exercises at three difficulty lev els a major step in 
the direction of criterion referencing This can be confirmed by considering 
the criterion referencing philosophy elaborated by the NAEP Research Di 
rector, Frank Worrier, in a research proposal to the Carnegie Corporation 
entitled “Exercise Development of Cntenon Referenced Materials * 


Criterion referenced exercises (items, questions) are designed spe 
afically to sample knowledge, skills, and understanding dirrctlv 




THE ASSESSMENT'S EXERCISES 


105 


Their purpose is to describe what students know rather than to 
rank-order students from low achievement to high achievement 
If one wants to describe achievements that almost all students 
have mastered (base-line information), one must develop exercises 
that relate directly to an accepted objective of instruction (criten 
on-referenced) and also are very easy 2J 


It seems dear that Womer is more interested m finding a common core with 
which nearly every student is familiar than m measuring whether students 
have achieved specific performance standards In fact, in the 
the quotation above, the enure distinction between objectives and behaviotx 
indicating criterion-level achievement of the objectives seems 

In a later paragraph, Womer reasserts NAEPs commitment to criterion 

referencing 

In its first attempt to 

Assessment selected four agencies develop criterion 

jeetives for its ten subject areas and then to develop 
referenced exercises to assess the objective 

In contras, to Womer, darn, however, the 
developed are seldom behavioral and, in any even., .here 
ancc criteria related to the objectives infere stine and important 

Womer, proposal deals w„h none o the „ tOTa , 5 CD „ 

issues regarding criterion r ' ,er ' nC ' n ® if ^ “^between subject matter areas 
ceptions of validity, item ana ys , Der formance criteria at all an 

with respect to the possibility of defining performa 

others WAFP asS erts its intention is to measure 

In short, it seems that although non , s something d.IIeren' 

the achievement ol objectives, ns actual c „ rclscs which sample the .asks 
NAEP, desire seems to be to deve op think students oug 

students learn to n^r - - 

to learn in school i nis p. 

students can do gcn eral sense objectues ulat t JY 

Thus, while the exercises ,t„ mC ed The individual exere 

An imcrfnl sense criterion referenced . NAEPs educa 

tional objectives and suo j 


23 April 1 971 Carneg> e ti 

24 Ibid 



106 


A STUDY OF THE NATIONAL ASSESSMENT 


of objectives such as might be involved in a judgment of functional literacy, 
tven in the basic skills subject areas, it is impossible to move from the re- 
sponse percentages on individual exercises to an assessment of the achieve- 
ment of subobjectives and objectives— vital questions of coverage, validity, 
reliability, consistency, and difficulty remain to be resolved. Unless these 
problems are faced with energy and imagination, the Assessment’s exercises 
will continue to provide only odd and disconnected bits of evidence, having 
limited application to research or policy. 



6 


NAEP’S MEASUREMENT 
OF BACKGROUND VARIABLES 


Several studies have been made attempting “ ' * 
ous background variables to educational achie ^ such ana lyses are still 
clear that the knowledge and techniques aval h]Stlc ation, the capacity 

in their infancy Despite increasing precision (he samc throughout 

to identify explanatory variables has remain progress made, either 

the las, several years Similarly, " ^ Ld employing vana 

through analysis or experimentation, in dwingui 

bles that are more or less mampula e that school has 

Thus, the basic conclusions of the f^Lra.ly expected, and 

less influence on educational achieve „ ^ ^ moM important findings 
that family background has more, lhls particular regard. 

Surprisingly few new msiglus have^natta ^ TALENT 

despite several m a J or d d Mathematics d , g7 „, 

(1S67), T Husens lEA^ ^ ,n England ( 967 an^^ ^ 

Countries (1967), th Fmialtty of Educational Opportunity ( ) 

the Coleman reanalys, • A Reassessment of*' 

Jencks reanalysis of ear Finally, there are the fm « g 

L,/y W ***> "JrrZ Z'L m Nine 

IEA Study of « d to measure an mcreased I numto 

- - lh ' 

variables in older to 



108 


A STUDY OF THE NATIONAL ASSESSMENT 


results are essentially similar to those of the earlier work no strong, positive, 
and consistent relationships could be found between existing combinations of 
school inputs and educational achievement Findings regarding the impor 
tance of socioeconomic variables were essentially those of preceding studies 

Thus, progress has been painfully slow for those who are attempting to 
develop a causal model of educational achievement For those who would 
attempt to draw policy implications from such research the frustrations have 
been all the greater 1 

This context is of great importance in understanding the past and future of 
the National Assessment of Educational Progress Little thorough analysis 
has been made of why the Assessment is measuring certain background varia- 
bles and what can and cannot be expected from such measurements Certain 
ly, there has been no systematic effort to consider whether the Assessment 
should attempt to measure a greatly expanded number of background varia 
bles in future years The only independent consideration given to these issues 
has come from Martin Katzman and Ronald Rosen, 2 and they merely pro- 
vide a facile inference that politics watered down the research design, they 
ignore the pnor questions of which research design was actually proposed or, 
more importantly, what could be expected to come from the measurement of additional, 
or more refined, variables 

An accurate evaluation of the politics of this aspect of the National Assess 
ment requires a respect for historical context and for the original intentions 
and statements of the Assessment’s leaders It must be remembered that the 
first major policy discussions and meetings regarding the Assessment occurred 
in late 1963 and early 1964, before any of the large survey studies analyzing 
the sources of educational achievement had been undertaken Furthermore, 
it is clear from recent interviews and from reading the internal memoranda 
and transcripts of meetings of that period that the two major social scientists 
leading the project, Ralph Tyler and John Tukey, had very limited, and 
indeed, quite realistic expectations and aspirations concerning the measure 
ment of explanatory variables In essence, they expected to avoid such efforts 
almost entirely 

This point was the source of considerable debate among the social scientists 


1 For an analysis of the limitaUons of existing survey research in this area 
see Gerald Grant, an essay review of On Equality of Educational Opportunity, in the Har 
card Educational Reticle, vol 42, no 1, pp 109-125, February 1972 

2 “The Science and Politics of National Educational Assessment,’ The Ret 
ord, no! 71, pp 571-586, especially pp 579-580 May 1970 



NAEP'S MEASUREMENT OF BACKGROUND VARIABLES 109 


attending the first major policy conference December 18-19, 1963 But Tyler 
and Tukey were \irtually decided on the matter from the outset the Assess 
ment would measure general levels of knowledge, skills, and understandings 
without regard to whether those levels were the result of schooling or other 
specific factors They maintained this modest view throughout and retained 
basic control over the decision since they were more intimately related to the 
Assessment than any of the other distinguished social scientists present 
cerpts from the transcript of that first policy meeting illuminate this issue, as 
well as the general intent that Tyler and Tukey had for the Assessment 


MR ORVILLE G BRIM ,JR ! RuBt11 ^ ge office ol 

Tukey, regardless of wha. .he original mandate o The l Office * 
Education was, I w,II no, st, here all day h*nom£ * 
involved in appraising the outcomeso thegr^ and P , h( . 

cess of the United States which P S ^ ab|e to givc 
school system without the assessm P dravv the action 

some differentiation of the : impact o eacn ? 

implication— otherwise, why t e * be drawn without 

MR TUKEY But the action implication can 
assessing where the results come rom 

MR BRIM I deny it, completely ^ ^ to see you deny 

MR TUKEY I will give y ou an “ a P the dub— I haven’t felt 

MR KEFPEL God, it's you’ [to/M „ 
better ,n years Bless you Go 'ahead n J be lhe case, the result 

MR TUKEY If, as we all know “ w eveI y,hing they ought 
of this assessment was that every y oughl t0> et cetera, et 

to and were behaving in all the ■ Y ' needed to improve 
cetera, we would have no S^'^have a feeling ; something 

. .mormanon comes 

MR BRIM V toundi , h^ w pmpoin. -he " beth „ if. 

approach If y states with some re focus of activity 

between ^ "TT. tThe ^blem ™e 

television or * sharpen it— but l 0 ® There is plenty of 

MR TYLER minds of many P «PI<= ^ g £ at j ca , 

s£ sf—f. Si ■£” sasss- - 

SSStf-— 



110 


A STUDY OF THE NATIONAL ASSESSMENT 


MR BRIM What you are suggesting, then, is— in fact, if I can 
resort to a kind of disciplinary observation as a sociologist, I 
would think of dividing up the sources in terms of the different 
institutions of society m this sense What you are suggesting, 
Kalph, is a different cut, almost in terms of the functional read 
ing, mathematics, and so forth, and that the efforts resulting from 
the survey should be oriented toward subject matter, like we don’t 
care who does the deficit in mathematics, let’s pour it on 

My target is not subject matter, but institutions I think this is 
good to get out in the clear 

MR TYLER. Take a field you aren’t so intimate with — the field 
of public health It is important to know, for example, that there 
is a great deal of malnutrition in a community, or there isn’t, and 
the question as to whether some of the kids are fed at home and 
others eat m dormitories and so forth is interesting if you can find 
it, ut, I think, if you try to load on your initial appraisal of where 
this community is in terms of valid learnings, and also the ques 
tion of finding out where they got it, it adds so much to the 
pr obIcm of analysis that it defeats the whole enterprise 

Id like to know, for example, where Negro children and rural 
children are with reference to the kinds of knowledge they have, 
certain basic skills they need, and so forth Just to find that out, 
which we now don't know, would be useful in terms of further 
action 

But I agree with you, if you can go on and make further analy 
ses to find out, in many cases it would be very difficult to find out 
whether it was a deficit in the home or in the school, and so forth, 
but at least you’d know where they are 

MR TUKEY I wouldn’t be against more detailed knowledge, but 
it sounds to me it costs somewhere between ten and a hundred 
!mes as muc to get it than it costs to get the basic information, 
so I will suck up for getting the basic information, fust J 


MR TYLER. If we think of the analogy in the employment field, 
our first question is, do we have six percent unemployed, and the 
next question might be beginning to test off in what areas are they 
losing their jobs most rapidly, and so forth, what is their cduca 
iona eve ut I think, if you start out by the notion of what can 
you attribute to the school system, you are licked because most of 
e mgs we value highly are such a combination of what the 
omes^ o, the schools do, what the community contributes, too, 
that its very difficult logically to say it’s this much 


'Proceedings of ike National Testing Project Conference December 18-19 1963 Car 
negie files, pp 19-22 



Mf.ps measurement of background variables 

done II" 1 ? Carsda ' e sciMo!s . 'or example, how much of that is 
kinds r ' h ° me ' ,h ' K:hoo, ' hmv much b V thc excellent other 
that t commumt y enterprise they have' Da we hive to know 
prP’' a a j*J V CF < l uest,ori * “Is Scarsdalc relatively undereducat- 
fh 1 t? d 1 030 thm{c of t,mes wfien y°u want to test out, but I 
MP nVFD^r Sh ° u,dn t want to start out by that level of analysis 
i i I don’t see the point of starting out with the other 

ievel of analysis 

/t/i? TILER. The Congress and people in general are concerned 
with such questions Do we have a large level of people having a 
relatively low level of education, and where are they and what are 
the inadequacies they have* After all, for the moment, our first 
question, like the question of do we have typhoid, is to find out 
what the state of the nation is where are our uneducated and are 
they among the Negroes or white/rural, or European 
MR BRIM There is a little difference in your analogy, there, 
because in the instance of typhoid, we know the cause and what 
to do about it I don’t think that’s fair in this case, we are taking 
the census without knowing the cause, trying to accomplish two 
things — both the incidence and an analysis of the cause at the 
same time 

I think you are right, now, you can certainly do the census 
without any trouble but, with a little additional data, you can also 
proceed from you r census data to an epidemiological analysis 
MR TYLER I just don’t want the impossibility of something to 
cause us to finally say that it’s too big a job 
MR BRIM I see what you mean 

MR TUKEY The facts put forward, here, I don’t think are right 
People were collecting typhoid statistics before they knew the 
cause 

MR BRIM But it didn t do them a damned bit of good until they 
knew the answer to typhoid 

MR TUKEY I wouldn’t agree with that, either People began to 
Jjjj/J nyt y after they had some indication of what the facts 
were 

MR BRIM After you get your prevalent statistics you begin to 
make epidemiological analyses, it wasn’t until they realized the 
rates were different and began to develop hypotheses— we ve ai 
ready eot the hypotheses Let’s build them into the census and 
realizef ahead of time, there are differences attributable to the 

familv rieht — to the community and to the school systems and 

let s apportion out these and build into our plans the possihihf) of 
an analysis of these epidemiological causes 

In thissense, now, Ralph’s point about let s not get hung up on 
. 0 lf c omes to an issue of do we have to have this or nothing 
aU, I certainly go along with the pom, o I view, let a start with the 


111 



112 


A STUDY OF THE NATIONAL ASSESSMENT 


census rate, because that would generate the motive power to look 
into the causes 

But we are social scientists sitting here and know better than to 
start with census rates 

MR DYER I don’t see how a census means anything unless you 
begin to work with classifications, and as soon as you work with 
classifications, you begin to get data that are meaningful in terms 
ol the worlds Brim is talking about 

MR TYLER You aren’t arguing with me on the value of addi 
tional data— and I repeat what I said to Bert, that additional 
data that you can build into it, that will not make the thing so 
complex that it fails m the early stage I am all for that, but I 
realize that you are on touchy questions when you get into trying 
to find out about the family, the notion of many of the Congress 
forth S lS "~' hat * hlS ‘ S P nvac y. that is nobody’s business and so 

And I come back to the original one Let’s be sure we get a 
defenstble level of education for different age groups that we are 
talking about and as much else as we can feed into it that might 
lave egitimate use in analysts This is my view of it — but not to 
demand that unless you can pick out how much of that was due to 
the school and how much to the home, that you wouldn’t have 
anything to do with it because I don’t think, even with the most 
sophisticated data at present we can tell, m Scarsdale, how much 

77 WaS duC t0 the home and to the sch001 

T A > \ } Wl11 11 one statement further — that is to say, 
i don t think this question is answerable I mean, the argument is, 
what was due to nature and what was due to nurture The biolog 
ical level has been now thrown out the window on the ground a 

i 0 BVnI n n era ^ ,0n 1116 Same th,n S ,s to be expected here 
MR BRIM Dr Tukey, with all due respect, I think that’s cloud 
mg the issue— I really do 

MR FIRMAN Among communities that are like Scarsdale, 
oug , we ound real differences m the way in which they scored 
a standardized achievement test, holding Scarsdale constant inso 
h ° W to ho!d Scarsdale constant 

„ ,, ™ “ giving us trouble by bringing in the word 

. 1 hC a PP ro P nate question at every point, whether na 
__ aa ’ S . * e ’ Scarsdale, is what might we reasonably hope to 
and ] he “P^ncy figure doesn’t get you into all this 
sort of thing, and what we have to do is to think of some reason- 
able control variables to go into an expectancy table so the data 
get meaningful at the local level 

I see the point, Ralph of saying something in the educational 
field comparable to today s unemployment rate is seven percent, 
but you have to move down a couple of steps to where are the 



/MEn measurement of background 


VARIABLES 

f un ' m P Io >™cm, and « ™ K to me, ifs only a quest, on 

tZl aZf r bab,UV ’ and “ SKmS “ “ «£ you start 
* * abouc lt as a P ur e science problem it really distorts 

W.JWT I warned to add one more thing Mr Chair 
an Jt seems to me, the lay mind, operate^ on a rough 
, turned correlations, and I have been impressed that we 

.»f VC , been S 0in £ ^ two directions here We have been discussing 
e theoretical and desirable We need to discuss the pragmatical 
Y practical, whether you want it to be or not the findings of such 
a purvey will be regarded as having these approximate correla 

We have, in the last forty years I should say, brought the lay 
nutid to believe there is a strong correlation between high salaries 
of teachers and high level of preparation o! teachers good school 
buildings good laboratory equipment and the product The prod 
uct is just, there is some assumed correlation and my experience 
m legislation going back a long time is that sometimes a rough 
and ready, crude chunk of data some of which I might quote but 
won’t take the time to do so, which have been practically used 
with executive sessions of committees — that these crude chunks of 
data which assume, let us say, the state to mention one — a survey 
of the qualifications of math teachers in the State of California 
was a turning point in one o! our major pieces of legislation 
If we are going to penetrate further into this in depth I think 
we have to devise instruments which are more acute than any 
thmg we have now And as Commissioner Keppel pointed out 
last evening where the lay minds represented by Congressmen 
have been unable to accept these rough approximations such as 
in the case of school buddings at the elementary and secondary 
level we have had no forward movement 

Now, they have assumed some kind of correlat, on between su 
penor facilities and high education and have enacted somelegis 
fatten Bui the rutac ,s narrow may I say-mMh scene*, mod 
ern foreien language, and engineering 

AndTotheronf Mr Chairman, .tseems to « m your effort 
, , „ , in th - nronosal let’s not forget that there is an 

to keep austen y P ^ d compo sition of libraries and 

assumed correlation /actor ,n the 

performance, a n^^'^Z'As pu, the hands of the 

toy mind an Educa[| ^ „ obviously going to be used in this 
“Ether he wants it or not, it will be teased out of this, 

CHAIRMAN GARDNER Are you suggesting we should measure 

facd'ties’ js w ^ an assumption that we are not real 

MR cis acocihcicM of a correlation of one to one be 


113 


sure 



114 


A STUDY OF THE NA TIONAL ASSESSMENT 


tween superior facilities and outgoing product, I think we ought 
to be honest enough and, as public officials, our integrity is chal- 
lenged, here, if it isn’t so, and we ought to be prepared to say so 
I put it in reverse, if it’s so, then it will be very clear to the 
Congressmen appropriating money that the federal government, 
after all, is a problem-solving agency in the field of education 


The ultimate use of this data is going to be an attempt to 
measure the product against the provision, and if there is an as 
sumption that superior facilities produce superior products, then I 
am sure that ready acceptance will be found for this 
Up to now, that hasn’t been any too clear to the lay mind, they 
have seen good teachers in poor buildings produce good students 
MR TUKEY It seems to me, this is an argument which says, 
evaluation at the higher levels is eventually going to be an essen 
tial 

MR FLYNT Yes, that’s right 

MR TUKEY because it seems to me, the lay mind doesn’t 
believe, I think, these things are crucial for the core subjects I 
would feel that you had to come higher up the scale before you 
had assumed that correlation 


MR FLYNT Oddly enough, the lay mind assumes it to be quite 
crucial for football — an absolute willingness to buy seventeen 
thousand steel seats for a stadium for a high school football team, 
a band room, a small choir room, and very expensive storage 
spaces for instruments There is some correlation between 
some kind of equipment We have just taught the schools that 
t ere is a correlation between effectiveness of science teaching and 
laboratones— in fact, the Office of Education spent some millions 
of dollars on improving the science laboratories m secondary 
schools, which has the effect of getting the teachers in and holding 
the teachers, without regard to salary and anything else, simply 
by improving the working conditions 

All of these things are apparent in what Dr Tukey is saying— 
it seems to me inevitably you are driven to this high level evalua 
bon of this thing, whether you want to or not 
MR ROSSI It seems to me, we ought to play the game straight, it 
° Ut the nght wa V’ we are stuck with it 
MR FLYNT That’s right, it is This is, essentially, what I’d say I 
have struggled with this for twenty to twenty five to thirty years, 
a *u u 3m n ° l at Surc somc °f these assumed correlations 
which are tentative and haven’t been accepted would, if thrust 
t roug , prove out, we have seen my own field of history You go 
to t e ng ish public school, to Eton or Harrow, the equipment 
and instructional facilities are antiquated m the extreme— the 
buildings constructed in the fifteenth century, and not very com 



mEP ' s meas vxement of background variables 


-dttlTo m T D n ; r Vc w t frozen ,o dea,h “ « * <*» 

around th™ talZ’J^ih ' 055 '’ th3t W zre ~ these «« the fading 
could be accumnLt A ^ C * p0nents of <he greatest wisdom that 
at the h,Jh ?? V * m °" e r °° m and we are challenged, here 
If it \f k t l CVe u l ° COns,dcr what the meaning of all this is 
r tru * that better Prepared teachers produce better pupils 

t f f SOC,OCCQnomic stalu s, we'd better know .t if it isn’t 
level W * d bCttCr St3rt 50016 sort of sch ° o1 organization at the local 


THORNDIKE The last comment suggests maybe whenever 
"e go into a school and test the pupils we might have a parallel 
set of tests to be given to the teachers 
tR FIRMAN In the ten schools I mentioned last night, we found 
some to be universally good and others to be universally ineffec 
tcac ^ in g basic skills In the good schools fifty fne percent 
o the staff had masters’ degrees and in the poor schools twenty 
four percent had masters’, and I think thats significant 
MR TVKEY But, is it significant about the teachers or about the 
policies of the board of education 54 


These discussions, aside from suggesting the inadequacies of making policy 
judgments through verbal discussion also provide evidence on several other 
crucial matters Francis Keppel, then Commissioner of the Office of Educa 
tion, had earlier asked for the Assessment to provide hard data with which to 
approach Congress particularly regarding relative inequalities in education 
al achievement and also concerning the absolute levels of achievement in 
various subject areas Tyler and Tukey were suggesting — on technical pr ac 
tical, and political grounds (in that order of salience) — that the Assessment 
should concentrate on measuring the educational equivalents of the incidence 
of malnutrition, typhoid unemployment etc Further Tyler and Tukey 
agreed all along with Cronbach s analogy that one also needs to know uhere 
the pockets of unemployment arc’ Thus they would attempt to collect some 
background data about the students so that comparisons could be made of 
levels of knowledge, skills and understandings among a variety of distinct 
groups, in each of the subject areas and at each of the age levels. And 
finally it was clear they did not expect the Assessment to determine uhy the 
pockets Here where they were, or how they -nigh, be rliminaicd 

The issue raised by Mr Flyn. of the US Office of Education in the final 
excerpt quoted is a subtle and critical one which never a as result ed Ifha. he 
teas suggesting elbpncally was that merely determining the inequity of edu 


115 


* Proceedings of the hatwruil Testing Project Conference pp 139-145 



16 


A STUD V OF THE NA TIONAL ASSESSMENT 


cational achievement among a variety of subgroups might lead to a greater 
recognition of problem areas in the field but that it would also lead to an 
undiscnmmating invocation of the many assumed correlations which then 
dominated the determination of educational policies In other words, it might 
mean that more money would be poured into teacher education, building, or 
other educational inputs which in fact could have inconsistent effects on 
educational achievement but which were then popularly assumed to produce 
inevitable gains Thus, he was suggesting that it would be necessary to test the 
assumed correlations and find out which of them held and which did not- At 
the time (prior to Coleman), it might have been reasonable to assume that 
some relationships between educational inputs and educational achievement 
would hold, and that the more mampulable of these could be employed to 
produce greater achievement and greater equality in American education 
Flynt’s concern was particularly perceptive and timely, but in fact if the 
Assessment had tried to gather input data to resolve this issue, by the time it 
issued its first reports in 1970 it merely would have added one more \oice to 
those that have since demythologized the assumed correlations It would not 
have been able, with techniques available either then or now, to identify any 
variations which could produce strong, positive, and consistent variables with 
which to guide systematic educational policy making 

The Tyler-Tukey position, doubting the utility of attempts at causal analy- 
sis, prevailed virtually unchanged from the first meeting in 1963 until very 
recently There is one exceptional moment worth noting, however Stephen 
Withey, the first staff director for ECAPE, compiled a foresighted memo 
during 1964 5 


INPUT FACTORS 

There seems to be no systematic treatment of the educational 
system input factors that influence educational progress A publi- 
cation such as Evaluating Criteria, 1960 edition, of National Study 
of Secondary School Evaluation (Washington, DC) hints at 
some of these items in its comprehensive enumeration of criteria 
for evaluating high schools But many are missing because much 
of the input is somewhat independent of the educational offering 
itself Input factors run all the way from psychological factors to 
sociological conditions The bearing of these areas on education is 
still a question with changing, developing, and innovative an- 
swers 


'“Input Factors,” ECAPE memorandum, Denver, Carnegie files 




NAEFS MEASUREMENT OF BACKGROUND VARIABLES 117 


The listing ol input factors raises questions as to which vana 
bles would be most valuable in an assessment such as that pro 
posed Thinking of the items in terms of research methodology 
also raises questions on the multiplicity of sources of information 
the relative cost of securing items informational comparability 
and the degree to which these factors reflect each other A further 
question asks which items are most meaning u w en on y 
pupils will be sampled from the area describe —an so 
The following list, culled cursorily from individual studies 
suggestive of the multiplicity of items 


1 Sex of student 

2 Age and grade 

3 Years since starting school , 

4 Pupil potential as expressed on stan preparation 

5 Educational readmess or measures of pres hoo p P 

6 Pupil motivation tmerests values and aspirations 

8 ££d"r pupil 

9 Size of class in which pupil is a member 
10 Special status of pupil 

Family 

1 Intactness of family 

2 Size of family 

3 Parents expectations and aca demic interests 

5 etc 

9 Educational 

10 Socioeconomic p 
Community 

1 Size and poP uIa ‘‘°" ain economic supports 

2 Type o! community 

^S^Wschool financing 
5 Climate of s “P>f educa „onal attainment 

? si «— * mOTbe p 



118 


A STUD Y OF THE NA TIONA L ASSESSMENT 


10 Rate of community growth 
Teachers 

1 Teacher turnover or average length of tenure 

2 Teacher preparation 

3 Teacher salaries — some measure of the distribution or the po- 
sition of the pupils’ teacher m the salary distribution 

4 Teacher identification with the community 

5 Teacher-pupil ratio 

6 Teacher benefits 

7 Teacher expectations for her pupil (s) 

8 Length of teaching expenence 

9 Certification of teachers 

10 Teacher qualifications above the minimum 

1 1 In service teacher training 

12 Teacher pupil relationships 

School System 

1 Class size 

2 Size of school system and rate of growth 

3 Special services — e g , guidance, health, library, etc 

4 Instructional supplies and equipment 

5 “Desirability” of school plant 

6 Professional preparation of school administrators 

7 School evaluation procedures 

8 School retention rates 

9 Ethnic and racial composiuon of the school 

10 School climate 

1 1 Breadth, scope, and organization of school offerings 

12 School policy (promotional procedures, etc ) 

13 Public or private school system 

14 Formal and informal school relationships 

15 Quality of administrative leadership 

16 Preschool services 

1 7 Compensatory school services 

18 In service program 

19 Articulation and coordination of curriculum 

20 Expenditures per pupil 

21 Capital outlay 

22 Budget 

23 State and federal aid versus local financing 

24 Proportion of budget spent on salaries 

25 Assessed valuation of school district 

26 Percent of funds obtained by local taxation 

27 Staff morale 



MEP’S MEASUREMENT OF BACKGROUND VARIABLES 


29 D^ U T" ° f res P°" slf “ l “>' "rthin the school 

w * ,he learn,n * s,,ua ' 100 ^ *■ 

30 Special, zed education offerings 

32 ok!! 3 ? 06 ’ P r .°£ ress ’ anc * graduation policies and practices 
Objectives of the school 
„ . ^Ppraisal of student progress 

Relationship of school to other agencies providing education 
eiationship of educational programs to current kowledge ol 
„. C mature an< ^ needs of youth in the community 
f- “ e ' a nonship between teachers and pupils 

R' elationship between administrators and teachers 
Jo Attitudes of staff toward students 

39 Age-grade distribution within the school 

40 Proportional educational and occupational intentions ol stu 
dents 


Whatever the desideratum of this memorandum, the effect could only have 
been to support the Tyler Tukey position that such an analysis was too much 
to attempt Indeed, none of the survey research projects since that time have 
been able to develop adequate measures for many of the inputs which were 
listed, most notably, attempts to measure the various affective factors men 
tioned (Pupils, 5, 6, and 7, Family, I, 3, 4, 5, 6, 7, and 8, Teachers, 7 and 12, 
School System, 7, 10, 28, 29, 32, 33, 34, 35, 36, 37, and 38) have been totally 
lacking or almost totally inadequate 


DESCRIPTIVE DATA VERSUS EXPLANATORY 
OR PREDICTIVE DATA 

Thus, NAEP eventually took a relatively descriptive stance toward back 
ground data The units for reporting would be those that would allow for the 
gross identification of pockets of more or less educational achievement and 
progress This distinction is critical Most of the background factors to be 
recorded were intended merely to describe the various subgroups, they were 
no. to be used either .0 explain or predie, /eve), 0 / achievement, ».h the 
possible exception of the SES measures (Surely ,t „ : easy, as «»*<* 
hr descriptive data to be used and/or abused ,n dramrng predict, se or ex- 

^ThusTvhde polmm “■ 'Vb.ch particular back 

ground factors mould be described. Kauman and Rosen and olher cnlics o, 


119 



120 


A STUDY OF THE NATIONAL ASSESSMENT 


the Assessment’s design arc wrong in their conclusion that politics was the key 
factor preventing the Assessment from becoming a rigorous input output sys 
tem of national educational accountability As noted earlier in this chapter, 
the startling but essentially negative “schooling” findings of the Coleman 
Report and subsequent studies suggest that Tyler and Tukey were astute and 
judicious in their recognition of the limitations of social science as a means of 
providing a comprehensive and sensitive causal model of educational 
achievement 


THE SELECTION OF DESCRIPTIVE CATEGORIES 


With this descriptive intent in mind, ECAPE and NAEP decided to record 
information by sex for four age groups, lour regions of the country, four types 
of communities, and two levels of SES Later, after various political consider 
ations, a decision was made to record race as well, in two broad categones- 
black and nonblack These categories provided the Assessment’s Inst round 
with a potential of 512 descriptive subgroupings (2 sex X 4 age X 2 race X 
4 regions X 4 community types X 2 SES levels) 

The NAEP description of its first reporting categories is as follows 6 


Various reporting categories have been and are being developed 
‘ i ' “ IC categories are the four different age groups— 9, 13, 17, 
25 35 In a few instances the same exercises will be used across 
three or even across all four age groups In many instances the 
same exerase will be used at two different ages Thus it will be 
possible to see some comparative data across two or more age 
levels 1 he choice of age groups to assess was made in order to 
sample near the end of primary education, near the end of ele 
mentary e ucation, near the end of secondary education, and 
after most adults have completed all of their formal education 
secon set of reporting categories is by geographic region 
kour regions were used m the sampling, the same four will be 
W ,°^ reI S r,1 " g PhtPoscs— ^ Northeast, Southeast Central, and 
est The Northeast includes all the middle Atlantic and New 
England states The Southeast contains most border stales be 
H hC N °« r ! h , and ‘ he South weslern and West Coast slates 
plus Hawaii Alaska and West Texas The Central area contains 
the other states These divisions correspond close!) to geographic 


\Wvii It National Assessment 3 (Denver ECS 1970) pp 39-41 



NAEr " S MEASUREMENT OF BACKGROUND VARIABLES 


° r m 


Large ernes (above 200000 population) 

Urban fringes ( cities adjacent to the large cities) 

Middle size cities (25,000 to 200 000) 

Small town-rural areas (below 25 000) 

n addition to these basic reporting categories additional infor 
mation about community size and type was collected Thus it 
a y e possible to report finer breakdowns by separating out 
central cities areas or truly rural areas if the data «« arrant 
A fourth reporting category will be sex Numerous studies have 
emonstrated that boys and girls produce different results in van 
ous subject areas Such differences certainly will appear in Na 
ttonal Assessment results 


A fifth reporting category is labeled SES Originally it meant 
socioeconomic status It now is read as socioeducattonal status 
Neither term may truly represent the intent of this breakdown 
The intent was to be able to report results separately for assessees 
from disadvantaged homes The great concern of contemporary 
society with the education of the disadvantaged requires an all 
out effort to provide information about the knowledges and skills 
of that group as they exist today 

Defining SES or describing the intent of the classification is 
simple Finding a good index or indices of SES is extraordinarily 
complex The literature of educational measurement yields num 
erous attempts at measuring SES each one of which has some 
major flaw Ideally one might want to classify assessees according 
to parental income In practice such information cannot be col 
lected as it verges upon invasion of prnacy Obvious substitutes 
are educational levels of parents and/or occupational levels of 
parents Such information can be secured relatively easily for 17 
year olds by direct questioning But 9 year olds are not apt to 
ha. e the information One can consider the use of existing school 
records (complete in some schools incomplete in others) or one 
can consider trying to get the information directly from parents (a 
tedious and only partially successful scheme) Or one can ash a 
series of simple questions that 9 year olds and 1 3 year olds can 
* «i<*h as whether a home contains an encyclopedia or a 

dadymewspaper orbewks etc and infer fan,,/, educat.ona! lescl 
I y c „rh indices National Assessment is trying all of these ap- 

from suchind.ces^ ^ ^ Qr more Q , fhem M ,„ pro „ dc a 

meaningful brealdown into too or more meaningful SES levels 



122 


A STUDY OF THE NATIONAL ASSESSMENT 


The final reporting category is race This category was added 
to the reporting scheme less than two years ago at the urging of 
persons concerned with obtaining maximum information about 
minority groups It is a controversial category that offends some 
people if included and offends others if omitted The policy com- 
mittee for National Assessment felt that the need for this type of 
information outweighed the dangers inherent in collecting it 

An additional practical reason for including race as a reporting 
category is the fact that it offers an additional category to be used 
in connection with low SES reports Many persons assume, incor- 
rectly, that most low SES individuals are members of a minority 
group Statistics, however, say that more members of the white 
majority in this country are low SES than are members of minori- 
ty groups in the country as a whole The use of race as a separate 
reporting category will enable one to look at SES and race both 
independently and together 

Unfortunately, the small size of the National Assessment sam- 
p e means that meaningful statistics will be available only for 
black, white, and other or for simply black and other The desig- 
nabon of race IS being made by the exercise administrators No 
individual assessee is asked to indicate his race While this is not a 
perfect categorization, no other scheme is perfect either It is a 
categorization that is close to common usage 


Assessment s secon d round, the Reading and Literature exercises, 
NAEP had attempted to produce somewhat more refined descriptive catego- 
ries The measures of age, sex, and regions remained the same, but the mea- 
sure of race was expanded to include “Black, White, and Other”, the four 
sizes of communities were substantially altered to include seven “Sizes and 
ypes of Community, which provide a combined measure of size and “com 
mumty-occupational” status (STOCS), and the two levels of parental educa- 
tion were expanded to four The General Information Yearbook, 02-GIY, describes 
e STOCS changes Although these refinements are sophisticated, they in 
no way alter the basic intent or value of the categories, which remain largely 
descriptive, and should not be taken to be explanatory or predictive The 
question of which descriptive reporting categories are likely to prove most 
u is important For instance, what are the implications for educational 
change of findings associated with sex, race, or parental level of education’ 


lamilv had a trlenh C ^ C * C tai * s ^ c f l collected information on whether the respondent's 
“ "''P h »"c. an encyclopedia, an aulomobile, a daily newspaper delivered 
subscriptions to any ma B aa,„e, and more than thirty books ,he home The tele 
phone and automobile questions were eliminated in laler cycles 
•Denver ECS, 1972, app D 



MEP’S measurement 


OF BACKGROUND VARIABLES 123 


of federal and st" i n f UU " , “ m ' x>, ' n,lall >' «M - the development 

" cons.» L, a « P f C '“ G ' V ' n rCf,Md ° bl ' a ™ and 

comm differences show up among these different sizes and types of 

micht h" ,nCS f ’* 15 concc,val> le that particular categorical aid programs 
g be useful in these different places, although the Assessment could pro 


virV I ,,t j 1 — ^ in c itsscsAmenr coma pro 

tie guidance as to what those programs should consist of Data com 
P «ng specific communities and specific states might have produced more 
irect information and impetus to change After all many policies are imple 
ented by local districts or state legislatures and state departments of educa 
t'°n, and directly relevant data might have helped them develop these poh 
cies Hon ever, the states and local districts can collect some comparative data 
0 their own, and some of NAEP s existing reporting categories (such as sex 
r ace, and STOCS) may be helpful in a general way in the determination of 
state and local policies 

The EGAPE NAEP decision not to sample and report the results by specif 
,c sta(es and communities is quite complex and it too has generally been 
misinterpreted, like the decision to undertake descriptive rather than explan 
atory analysis For example, as noted earlier, Katzman and Rosen have ar 
gued that politics prevented NAEP from collecting state by state data which 
is true, but they have failed to inquire whether such collection was seen as a 
crucial objective or as relatively debatable 

Several facts are worth noting here The transcripts memoranda and w 
terviews suggest that Tyler and Tukey had no rigid expectation at the outset 
that they would gather comparative data by particular states and comma 
mties Their primary interest was in establishing a national census of educa 
tional progress They indicated a desire to report subpopulation data for the 
sake of comparisons and identification of pockets of special need and Keppet 
at one point stated that he hoped the Assessment would promote healthy 
competition among communities and states (while avoiding invidious com 
pansons) In one instance the January 27-28 1964 conference providmg 
states with comparative data was listed as a major tentative aim But m fact 
many at that meeting assumed the states and communities would actually 
crave such intelligence about themselves and in terms of the current account 
ability movemen l this assumption was partly true But even given an overly 
optimistic sense of state and local demand lor such Information Tyler and 
Tukey in no way indicated that states and communities were h. s h priority 
reporting categories Their flexibility or ambivalence on this mailer ,s reflect 
ed in the fact that they considered numerous alternative reporting categories 



126 


A STUDY OF THE NA TJ ON A L ASSESSMENT 


view, the existence of NAEP puts the “impossible dream” welt 
into the realm of possibility 

Two opposing observations need to be made First this activity 
is clearly outside of NAEP s original charge and would possibly 
endanger NAEP’s capability to carry out its primary task of pro- 
viding periodic unbiased assessments of the nation’s achievement 
m education On the other hand, the widespread demand for 
information for building social legislation to provide equality of 
educational opportunity has already moved NAEP and the Anal 
ysis Advisory Committee in particular to try to quantify the “ef 
fects” of such factors as parents’ education, region, and race The 
data currently collected that characterize groups of respondents 
present senous problems of interpretation because of their limited 
scope and may be quite misleading particularly in the hands of 
others 

Because the existing data will be used in spite of senous draw 
backs, one responsible position ma> be for NAEP to collect data 
as carefully and with as much thought as it is now collecting on 
achvesement itself 

To provide data of this sort coordinated with the present data 
on achievement will require an effort of about the same magm 
tude as was required to develop NAEP’s present system- However, 
the methods used and the expenence gained by NAEP in devising 
the assessment scheme will of coutsc be of great aid m developing 
the new vanables and methods required 

The following is a bnef description of some of the issues that 
might be involved m this enlarged study 

1 We would get background information, probably on a subsam 
pie of the total NAEP sample, and we w'ould probabl) follow the 
present rule of not asking all questions of all the children 

2 We would wish to determine some additional things about the 
child, such as IQ, attitude, hearing, sight, and general health 

3 We might wish to obtain some special school-child sanables, 
such as the teacher’s assessment of the child, the length of time the 
child has been m that school system, attendance record, lc\el of 
discipline, whether the child was bused 

4 We would need to follow the child home to determine the 
makeup and life-style of the home, income, attitude of home to 
school, expectations for the child availability o! age to learning, 
help with homework, etc. How does the child relate to parents 
and the establishment 

There is no shortage of experts to provide NAEP with opinions 
as to which sanables would be most valuable to measure and 
what the best procedure might be in order to go about measuring 
them Some possible mechanisms for tapping this expert opinion 
are 



MEP'S MEASUREAfENT OF BACKGROUND VARIABLES 


1 Have conferences of interest groups 

2 Contract to special firms such as SRI or AD L 

3 Obtain the services of a School of Education 

4 Assemble an in house group at NAEP 

5 Set up an ad hoc group of consultants to work on this problem 

NAEP must have a well defined charge to give to any such 
group, however, its success m defining measures of assessment 
would be one of NAEP’s strongest assets for this task 

In summary, we find ourselves, at a time when society is eager 
to work on the problem of improving educational achievement in 
possession of a valuable tool for contributing to this goal If we 
continue to collect descriptive variables of the child and his posi 
tion in society m an ad hoc manner, the results will still be used 
possibly by ourselves as well as others to try to solve problems for 
which they are not appropriate II we can correlate our data on 
achievement with fairly complete data on probable determinants 
of that achievement we may be m a unique position to shed light 
on the educational process 9 

Despite the fact that this memorandum comes from two of the nation s 
leading statisticians, or perhaps because it does it is very striking in its simi 
lanty to the amorphous and unsystematic planning memos which preceded 
and generally characterized the development of the Assessment itself 
There are four points here which seem intuitively and empirically defensi 
ble First, more data on factors affecting educational (hopefully more broad 
ly defined) achievement are desperately needed Second, in the long run 
randomized trials might be very helpful in isolating important factors Third, 
NAEP’s sample population or some subsample thereof might be useful when 
the time for such randomized trials comes 10 Fourth as we noted above and 
as we discuss in Chapter 7, the existing NAEP reporting categories have 
created ‘ serious problems of interpretation because of their limited scope and 
may be quite misleading particularly in the hands of others ’ 

But there are no logical or empirical grounds for suggesting either that 
NAEP would be the appropriate sponsoring agency for the effort that Gilbert 


9 June 22 1972 Carnegie files 

'“Eventually as the definition of educational achievement is consis y 
broadened to include more than merely scores on achievement tests J 

serenes measuring actual skills and understandings m.ght be 
tow these measures are qutte unrehned Superior measures seem ‘ 

rom other smaller scale efforts to preduee vahd and reliable measure, 
mdeistand.ngs that might reveal interesting relationships with the dete/minan 
ducational growth 



128 


A STUD Y OF THE NA TIONAL ASSESSMENT 


and Mosteller propose or that the next appropriate step in the field of analyz 
ing the determinants of educational achievement should be a large observa 
tional study Indeed, there is a great deal of evidence to the contrary 
In the first place, as previous sections of this study have indicated, even 
given its relatively limited intent and mandate, NAEP has had a wide vanety 
of serious problems, in terms of both management planning and quality con 
trol It is surprising that Gilbert and Mosteller suggest that NAEP has “per- 
fected and tested tools for assessing educational achievement ” 

Second, even if NAEP were to correct its many inadequacies and then 
demonstrate that it has some real utility that warrants its cost to the federal 
government, such a demonstration would not prove, in any way, that it has 
the technical and political capacity to oveixee or undertake the sort of path 
breaking research analysis that is necessary NAEP was started in 1963 to 
provide hard data for USOE, Congress, and a vanety of educational interest 
groups Vanous people wanted stronger proof of diverse conditions in educa 
non But, in fact, standards for proof have increased so considerably in the 
past several years that NAEP’s hard data is actually quite soft, as considered 
in more detail in Chapter 7, it does not hold up under the kinds of public 
scrutiny that it was onginally intended to withstand Slated differently, as 
S ^* n eo r f measurcmcm a " d evidence have become more sophisticated, 
NAEP has been unable to keep up And there is much to suggest that it will 
be difficult for NAEP to catch up to current standards, not to mention keep 
ing up with standards as they continue to nse in the future To ask this same 
organization to move ahead in the extremely difficult area of causal analysis, 
given its administrative, technical, and political limitations, seems clearly to 
be asking too much 


Third, and perhaps most importantly, there is no evidence that Ih, next 
logical step in the field of causal analysis is a large observational study-or 
even that on, next logical step is a large observational study The memo’s 
conclusion that thus, from a technical point of view, the existence of NAEP 
puts the impossible dream' well into the realm of possibility" ,s preposterous, 
both when contrasted with Tyler's and Tukey's humihly on this subject and, 
more significantly, when placed against the findings and conclusions of the 
most sophisticated atlempts at causal analysis completed lo dale 

Impending omniscience is clearly much less a reality lhan a vocational 
/ T statisticians As indicated above, surprisingly little progress has 
t C f ^ 3 rCSU,t °* se%cral ,ar g e observational studies and numerous 
j*” ° ° Wln ® Plication of the Coleman Report. Despite substan 
increasing investment and ngor the subsequent studies in the United 



NAEPS MEASUREMENT OF BACKGROUND VARIABLES 129 


States and abroad have failed to move rapidly ahead toward isolating mam 
pulable factors that positively, significantly and consistently affect educa 
tional achievement 

The memo, before Us statement about fulfilling the impossible dream 
does recognize that progress in this field may be extremely difficult The 
stated aim would be to identify the variables affecting achievement and 
insofar as possMt to quantify their effects and interactions (emphasis added) 
On the other hand this seems to ignore the existing studies identifying such 
variables and the limited uses (and in some eases limited analyses) of these 
studies IEA has data on educational achievement in twenty two countries 
which include measurement of a total of some eight hundred (as yet under 
analyzed) background variables And yet its tentative conclusions regard , g 
the relative importance of family background factors and the relative un 
portance of school factors are much the same as those which have b en 
repeated again and again over the pas, several years The memo, 
does not suggest that it will be feasible to move eyon , denoted 

more sophisticated understanding 

variables Why would one move ahead w h anotte S „ 

study before the presence of some rigorous evidence a, 
not empirical— that the study would advance : t e a h ea d with such a 

The memo itself provides one rcpo rting categories are sub 

study It correctly suggests that NA P ^ P ^ on to arguc tha , 
ject to senous misinterpretation an a u ^ n{J data this problem 
if only NAEP could collect a great dea ‘ d 8 lstinctIori between report 

would be remedied First this position 1 ^ and -^[y^ng them in ex 

mg the results m terms of descrtphve an( f Mo5t elIer despue then Conner 
planatory or prcd.ct.ve terms Gilbert the carl ,er NAEP mien, 

tions with the project appear to ' "l 1 " , , , s , s Advisory Committee have 

when they suggest tha, NAEP and the Analysis^ ^ ^ as 
been trying to quantify the effec s |JlBn claim this In all its 

education region and race It , hou h of ,e„ unsuccessfully to 

publications NAEP has been * ^ mls ,n,erpre. NAEP . intent and 

avoid any such ““l^t^uld do more of what i. is no, domg wou 
to allege now that H on y 

done better x/fnctrllers suggestion that complice* ® 

Furthermore Gilbert and ^es luW m.tigate against m.smterpreu 

analysis by measuring more v A1 j , he m0 st sophisticated stui 

“"^rjdrr^edn. - d ' tepub " c 

to date have filtereu 



130 


A STUDY OF THE NATIONAL ASSESSMENT 


large m terms of a few gross generalizations about the nominal effects of 
school resources and the significant effects of family background Indeed, the 
more sophisticated the analysis preceding the conclusions, the more likely it 
seems the conclusions will be misinterpreted and abused with increasing zeal 
There is no doubt that NAF.P’s attempt to be purely descriptive is thwarted 
by various interpreters once the results are in the public domain To some 
extent this seems an inevitable problem, but insofar as it is not, the realistic 
alternative policies available to NAEP include (1) rigorous refusal to suggest 
causal factots and (2) specific, thoughtful attempts to interpret results, re 
peatedly emphasizing their descriptive nature, and what should not and m 
fact cannot be inferred from them 

One final set of remarks regarding the June 22 proposal Fust, the desenp 
non of some of the factors that might be involved m the study seems quite 
unimaginative and redundant from the start Second, school factors, either be 
cause of politics or pessimism concerning the future, have been omitted from 
the list of priorities While >his would have to be corrected in future efforts 
before they could be considered complete, even then it would be shortsighted 
to undertake another extensive survey of the effects of school factors without 
first redefining school outcomes and inputs It is known from past studies that 
the present variation between the schools measured is too slight to produce 
strong, consistent, positive relationships between school factors and educa- 
tional achievement 


The memo casually suggests that this new analysis would make use of 
NAEP’s matrix sampling system, which xs described in the next chapter 
Naturally, one of the only reasons to ask NAEP to make such an effort would 
to make use of its sampling mechanism, but at the same time this notion 
introduces subtle but severe complications The goal would be to build a 
compre ensive theory of the determination of individual educational 
ac 'cvement but instead of studying individuals in depth and over time, the 
met wou aggregate highly fragmented data on individuals representa- 
tive of population subgroups 

This approach seems unnecessarily convoluted from the start The use of 
this sampling system for these purposes would create new and unexamined 
n *~* cncc P ro cms morc complex than those which Coleman and all the 
uent reports have faced and failed to overcome Measuring the deter- 
° ky following an individual or a group of individuals, in 

Cp ’ "T >me ° nC 10 T,mC T "°’ 50 far has P rovcd impossible Aggregat- 
mg data about indn iduals studied ,n less depth has been ev en less feas.ble As 
James Coleman noted in 1964 



NAEP S MEASUREMENT OF BA CKGROUND VARIABLES 131 


It is painfully evident to anyone who attempts to study a social 
system that our quantitative research techniques are in their in 
fancy For, by sensitive observation and description (as exempli 
fied say, by William Foote Whyte s Street Comer Society) we can 
trace the functioning of a social system Yet when we attempt to 
carry out quantitative research in such a system we find ourselves 
stymied We switch from a sensitive examination of events in 
which intimate sequence in time suggests causal relations between 
events, to a crude measurement of characteristics and a com 
parative cross sectional analysis that relates one characteristic to 
another That is when we shift from qualitative reporting to 
quantitative analysis we change our very mode of inference 


In the years since the above statement was written little progress has been 
achieved in improving quantitative techniques enough so that t is c a g 
the mode at mlcrencc can be fully understood and accounted or And i now «> 
the June 22 NAEP memo comes the suggeslion that the nex ■ ep 
aggregation of various part.al quantifiable t*«a«aasticso^ r direcI , 0 „ 
als none of whom would be studied in any dp 
involves ...II another change in the mode of moving array 
important problems suggested by Coleman c ore (he 

and Without propostng to resolve them £, . proJ ec, 

more obvious practical difficulties of as g matrix sample 

there ,s a substantial burden solved m " analysis at this 
has any superior virtues to recommen breakthroughs toward a 

time It is much more likely that the next from m dept h studies of 

causal analysis of educational achievement wi ba$ed on , m aginative 

various individuals and from cxa ^ 1D ® ^ jn subs tantially different pro 
efforts to introduce new inputs or o p 


portions 


11 The Adolescent Society 
lew York Basic Books 1964) 1 


in Phillip E 
192 


Hammond fed) “ 



7 


the SAMPLING DESIGN 
and EXERCISE PACKAGES 


The Assessment’s matrix sampling system is one outcome 
he NAEP intention to measure the knowledge, skills, and understandings 
of groups rather than individuals While the matrix design was carefuliy 
fashioned to yield precise estimates of group responses on the exercises, as we 
shall see, it imposes some serious limitations on the usefulness of the Assess- 
ment results 


SAMPLING 

The NAEP probability sample was designed to represent subgroups of the 
population, classified by sex, age, region, sire and type of community, socic 
econonuc status, and race NAEP sough, a sampling system that would per- 
m.t subsequent manipulation of the data in order to obtain estimate, for 
subpopulations no, defined in the original design, and, ,n addition, NAEP 
anted a sample , ha, would facilitate smooth held operations, provide simple 
es, nation procedure, and achiese these aims a, the lowest possible cos, 

,ae?d . r" u ° b J' c,,va '«< •« the selection of a srrat.fied multi* 
tage design In the firs, stage of a multistage design, the population to be 


/. IX 9 -I 970 ' W 7r a ?2 a '“ u "' o' ‘hi NAEP sample design we NAEP top, n 
1970)' app b .fC^pCb^p.Ln, (Dem e, ECS, 


132 



THE SAMPLING DESIGN AND EXERCISE PA CKA GES 133 


sampled is divided into cells, and a sample of cells is drawn In the next stage 
samples are drawn from the populations of each of the cells selected m the 
first stage 

For the Assessment, the first stage involved construction of an area sampling 
frame, in which every square foot of land area in the United States was 
assigned to a listing unit — a small geographic area with a recognizable oun 
dary (such as a county) and containing at least 16 000 persons ac isti g 
unit, depending on its population contains one or more Primary amp ing 
(PSU’s), which are units of roughly uniform population Listing uni w 
stratified by region, size of community, and a combination o mcom 
geographic location within the region . „j„ a ii v 

NAEP determined that about 2 000 responses per exercise for md.v.dua'.y 
administered exercises and about 2,500 responses per exe sma ]j 

administered exercises would provide a sample large e " ou S , , The 

differences in attainment over time a. the administer 

staff decided on the basis of variance and cost cons Thls , 0 

each exercise package to nine to twelve respon ents 1 p or eac h of 

the decision to select 208 PSU s in all fi ty two in cJed by d raW ing 

the four regions, then the first stage samp * t0 population basis 

PSU s from size of community strata on a prop seIccted 

The second stage involved sampling the lo de s,gn 

PSU and determining weighting factors Th level 

mg several separate surveys onefcreac each PSU* For 

About 150 respondents were chosen for S ^ ddlbcra iely over 

seventeen year olds and adults low SES respondents W '" a to make 

sampled m order to provide a large enough sample 
reasonable estimates 

PACKAGING THE EXERCISES 

. n t ac ed together in 

Exercises from m^lll responden. > Most 

packages one of wft . 

— — — p “ fag “ 
each PSU „ 

seven.een year olds “ S.O Tor eomphung nm 

complete up t0 OUf -Lpleting four 
three and $20 for completing 



134 


A STUDY OF THE NATIONAL ASSESSMENT 


packages contain between ten and fifteen exercises and require thirty five to 
forty-five minutes to complete Each individually administered package is 
given to nine students in each PSU, and each group administered package is 
given to twelve students in each PSU 
The exercises and instructions, in addition to being available in written 
form, are read to the respondents by a tape-recorded voice in all subject 
matter areas except reading, for all ages except adults The packages are 
administered by NAEP staff or by local staff who have been specially trained 
by NAEP 


THE INTERPLAY OF SAMPLING, PACKAGING, 
AND UTILITY 


The remarks below concerning various limitations of the data collected by 
ent are not intended to be a criticism of the sampling system or 
pac aging criteria per se The sample design and packaging did not create the limita 
ns we discern. Rather, limitations in the original conception oj how the Assessment might 
e scaled a sample design and packaging system that has tery distinctly limited 
e united testing period fifty minutes or less — is partly a function of 
t c same limitations in the original conception, but it is also a major source 
of those limitations 

rom the very outset (1963), NAEP’s leaders maintained that they wanted 
away from the norm referenced testing of individuals toward the 
measurement of national and subgroup levels of achievement 4 The matrix 
samp mg system provides several major benefits First, it allows the Assess 
ent to measure performance on large numbers of exercises in each subject 
it out requiring long testing penods and without being limited to 
q ^ mvkered memorization items Second, this collection of data on 
. ^ ^ xcra * cs > g«ven the sample design, permits the analysis and dissemma- 

resu ts or the national population, as well as comparisons of the 
pertormances of various subgroups withm subject areas Third, the structure 


viduak overturn! lh “ °" cntaUon . NAEP was not interested in lesung the same mdi 

duldtenTho hTv n0t d ° *° Instcad ’ «ch cycle wfll test different 

children who have background chara«ensu a irrular to those tested earlier 



THE SAMPLING DESIGN AND EXERCISE PACKAGES 135 


and the size of the sample make tt possible to detect relatively small differ 
ences in group performance over time Fourth the packaging system pro 
duces certain economies as well as ease and relative uniformity in admimstra 
tion And finally according to NAEP claims with which we disagree the 
short testing period of thirty five to forty five minutes is per se a positive 


benefit of this system . , 

While the sampling design and packaging system are reasonably well de 
signed to provide aggregated data comparing performance over * 1[ ™ e vvl * 
between and among various groups the Assessment s data wou ave 
much richer with educational implications if it had simultaneously gathe ed 
more information on the performance of individuals Disregard, ng or h 
moment the major limitations resufung from the selection of distinct subject 
areas the selection ol objectives and ihe development o t e 
complete mformauon on the performance of individuals “ 

subjL areas might have produced 

advancement of learning theory or in the delineation p 
questions and perhaps even curricula ka „ e constrains the 

The packaging system of icn or “-“mg waV^he 
analysis of individual performance ronta in material from more 

criterion for the packages was that cac mus package had 

than one subject area In the firs, cycle this meant .ha, -J ^ ^ 
questions in Science Writing and toze wa s extremely 

package had questions R«^,ng and L,,er«ure Thu^ _ ^ ^ 
small leeway for asking an individual smaM k eway for having an 

within any one field at the same time ther ^ ^ningful across 

individual complete an array of exerc 

" tL constrain, was exhaled by 

Each package had to contain e „ r c,ses al ihree d to 

For example a given student mig of such high qualny • a 

en, levels oi difficulty Even .1 the ^attainment ol spec! = 

they could precisely indicate the attarnm n. o ^ virtually pr e 

subobjectives or objecl.ves this P^"^” ch on ,he deve.^ 1 
eluded the gathering of usefu , h e casies , level might has 

reading skills A -“^d no7.he olher four TM 

pleted two " Rodent could no. do 

possible to know what h ^ lh e smdent s S , , bc 

analyze wha. .he ° be impossible to erineally >* 

and problems were Similar/ 



136 


A STUDY OF THE NA TIONAL ASSESSMENT 


performance of a student who could operate at two or three difficulty levels 
on some exercises, preventing the generation of hypotheses about the relation 
ships between different difficulty levels This objection is not intended to 
negate the group analysis utility of including the difficulty levels within each 
package Rather, it is intended to point out that it would have been quite 
useful to also know how often, and m what instances, students operate at 
different levels of difficulty, either within a particular subject or across subject 
areas 

Beyond what educators would like to know about performance within and 
across subject areas, and within and across difficulty levels, the packaging 
and sampling system also prevents the creation of a critical mass of individual 
data in another way A major goal of the Assessment is not only to measure 
knowledge of subject area contents but also to measure skills, understandings, and 
attitudes Again, the Assessment can do this by subgroups of the population 
These comparisons are interesting, but because of the inadequate measure 
ment of the skills, understandings, and attitudes, the resulting generalization 
characterizing the reporting of subgroup performance produced little that 
can directly aid researchers, much less educational policymakers In addition, 
however, information about differential individual performance on contents, 
skills, understandings, and attitudes within a subject area would be quite 
useful Information about individual performance in each of these four cate 
gories across subject areas would be still more interesting And all this mfor 
mation about individuals, when measured across difficulty levels, would be even 
more productive for rigorous educational research 

This last mentioned constraint, that individuals are being superficially 
tested, qc not tested at alL, for their differential performance, on contents, 
skills, understandings, and attitudes, also feeds back into the first two Iimita 
tions discussed and further exacerbates their cursory qualities Even as an 
individual is tested on a strictly circumscribed number of questions within a 
subject area, at varying difficulty levels, the nature of the exercises is also 
subject to change, sometimes measuring contents, sometimes skills, sometimes 
understandings, and sometimes attitudes, thus further reducing the possibility 
of any adequately informed and complex statement about how individuals 
perform within and across subject areas and within and across difficulty Jcv 
els as well as across the four categories of contents, skills, understandings, and 
attitudes 

// must be emphasized again that NAEP did not hy to measure these complex areas and 
fail What tie have considered u a combination of NAEP s limited conception of it hat 
kinds of information would be useful and the relatively subtle interplay among the sample 



THE SAMPLING DESIGN AND EXERCISE PACKAGES 137 


design the packaging system and the data that emerges from the Assessment These 
factors indicate the \ ery limited extent to which present Assessment results 
can be used by the educational community And here we are not even talking 
about answering research questions or informing policy decisions we are 
merely talking about the generation of research hypotheses a function whic 
the present group results serve only in an extremely narrow and indirect way 

One further subtle but important restriction inherent in the problems de 
scribed above in this chapter also places serious constraints on the comp 
understanding of the performance of groups While the gross compara iv 
performance of groups can be reported within subject areas across subject 
areas and aero* d.ff.culty leve.s there ts no way rally <■> < ‘ ^ 
formance capabilities are distributed within groups or w et e of 

ferentially distributed across groups For example wit in e 
lowSEsUite rural thirteen year old males - 

authority whether many individuals can answer , questions in 

all subject areas whether some can do some <yp« "eshons ,n other 
some areas but not the same types or ot er VP across s „ bJ ect 

areas or whether the capacity to complete who make up 

areas is clustered within a small population qualities of a 

the group Stated differently in trying to ' 'ijn poorly or 

group do some students that make up particu m in performance 

well m all subject areas* Or are there ""e categor.es of con 
across subject areas across difficu ty eve ^ students w ho do well in 

tents skills understandings and attitu es wou , d do well in art and 

science in a particular group the same stu e nom mal amount 

citizenship* And so on For some of *~*~ZX.* and for o.heis 
of information from which only weak inferences 

there is no information whatsoever „„hin gt™P s “ ch d ‘"' r . 

Beyond our wanting to understand th van0 „s groups and 

ences in capability may be *-*«• Thll tnadeguacy ■ < * 

knowledge of these differences would ^ inferen „ as mentioned.. 

limited intentions on individuals it is not surprising 

costs of omitting data on 



138 


A STUDY OF THE NA TIONAL ASSESSMENT 


10 test within no more than a single lilty-minute classroom period For one 
thing there was a practical and, therefore, political gain from this decision 
since there would be minimal interruption ol school routines Surely there 
might be reasons why NAEP might not have wanted to go to a two-day 
testing period similar to that employed m Project TALENT (During the first 
meetings in 1963 and 1964, John Flanagan of Project TALENT strongly 
suggested to the Assessment’s backers that the Assessment would be much 
more useful if it measured individual educational progress, in depth, over 
time, as well as measuring group performance But there was no deliberated 
response to his suggestion ) A more realistic and complex concept of what 
knowledge is and how educational research might proceed would have pro- 
duced an optimal testing time that provided hard data for individuals as well 
as groups 

NAEP tends to obscure these serious issues by making a link between the 
conventional testing and ranking of children and its decision to have short 
testing penods Again and again NAEP promises not to gather enough mate 
nal to tank students against each other Its publications repeatedly observe 
t at the Assessment is measuring performance on exercises, not measuring 
and scoring the performance of individuals 


Package number 1 for use with age seventeen assessors in assess 
ment year 01 contained eleven exercises Of these eleven there 
were seven multiple-choice Science exercises, three free response 
Citizenship exercises, and one essay Writing exercise If one at 
tempted to add scores from seven Science exercises, plus three 
Citizenship exercises, plus one Wnnng exercise, ,he total score 
would have no meaning But the purpose of National Assessment 
" ' “ " ep °. rt S fP ara,eI y tor exercise, not to report a score for 
an individual assessee Therefore, the project was free to package 

forty or foSv'fi a " y “T" 11 '" 1 fa5h,on th at added up to about 
forty or forty-five or fifty minutes of assessment time for each 


y, the attempt to move away from norm referenced testing did not 
equire such an extreme fragmentation of the data collected from individuals 
Even if students had been tested for several hours, there was no possibility 
. V *° U ranked against each other since there were so few students 

a any g,ven age level within any given school In reality , NAEP s 

’C-ieW /„/ STO/ „„ 02 CIy (I ^ n , er ECS lg72) p u 



THE SAMPLING DESIGN AND EXERCISE PA CKA GES 1 39 

objective of avoiding the norm referencing of students has no inherent bear 
ing on the length of tune during which individuals might be assessed 



8 


REPORTING THE 
ASSESSMENT RESULTS 


1 he potential uses and abuses of the National Assessment 
results depend critically on how they are reported Naturally, this dependen- 
cy can be considered only after analyzing the complex issues discussed in the 
preceding chapters The quality of the results available to be reported is affect 
ed by the strengths and limitations in the selection of subject areas, the dehn 
eation of subject area objectives and subobjectives, the development of exer- 
cises, the measurement of descriptive background characteristics, and the use 
of a matrix sampling system, packages, and short testing periods These deci- 
sions, plus several other types of factors, including the technical limits of 
social science, the politics of federal-state-local relations, economic con 
straints, time restrictions, the intentions of the Assessment’s original leaders, 
poor management planning, and mediocre service by many of the major 
contracting groups, have led to very serious weaknesses in the results that the 
Assessment has obtained to date Beyond these weaknesses, several devel- 
opmems in the larger society have further limited the utility of the results 
( I hese deselopments are considered later in this chapter ) 

But given the results that it has, regardless of their limitations, NAEP is 
obliged to make decisions as to how they might best be reported The primary 
audiences for the NAEP reports, according to NAEP's Own definition of 
purpose, include politicians, the lay public, scholars, and educators The in 
tent is to consey results in ways that are "understandable, meaningful, and 
HO 



REPORTING THE ASSESSMENT RESULTS 


141 


useful ” 1 2 To accomplish these ends, NAEP’s efforts produced several detailed 
reports containing exercises and results in the five subject areas tested during 
the first two years of operation,* a variety of summary reports published in 
NAEP magazines and newsletters, several articles written by NAEP staff 
members or consultants to appear m non NAEP publications, and releases 
distributed by the NAEP public relations office to professional associations 
and newspapers throughout the country 

Analysis of urlually all the NAEP publications and a sampling ol mass 
media articles on the Assessment indicate that none of the report formats, 
tilth the potential exception of the several detailed reports, can be used as a 
basis for major research or policy guidelines in improving education Clear y, 
the more popular reporting formats are intended in part to draw pe P 
interest to the basic, more detailed reports, since the popular reports, t g 
thoughtfully edited, in and of themselves are unquestionably of hit e su 
stance Beyond stimulating interest in the longer reports, they can J 
fced only as having some very generalized s^ohe 
might convince the public and concerned politicians becoming 

and the fact that education is m some way evaluating itse 

Butherem lies a major irony The Assessment “““atm! 

the lay public and relevant politicians about t e s ^ , n 

achievement in the society, but some also expect c(c Again and 

determinaUon of specific policies, voting of schoo u ^ ^ t b e Assessment 
again, despite Tyler’s and Tukey’s more humble vie\ 


1 General Information Yearbook, 02 GIY (Denver ECS h 

2 Assessment Reports 

No I Science National p art ,,l 

No 2a Citizenship National Resu l»- 

No 2 Citizenship Nauonal Results 

No 3 WriUng National Results 

No 4 Science Group Results A 

No 5 Writing Group Results 

No 6 Citizenship Group Resu 

No 7 Science Group R™^£_ Wnnng 

No 8 Writing Nauonal Resu 

Mechanics 

02 R-20 Reading Selected Ex (Gene ral 

02-GIY Reading and Literatu ( 

Information Yearbook) 


July 1970 
July 1970 
November 1970 

November 1970 

Apnl 1970 
April 1971 
July 1971 
December 1971 



142 


A STUDY OF THE NATIONAL ASSESSMENT 


would produce, ECAPE and then NAEP publications promised that policy 
guidance would be a major function Yet the several substantive, detailed 
reports are quite unhelpful reading for either the lay public or politicians If 
they are to be useful to anyone it will be to curriculum developers and per 
aps certain other education specialists The public and inquisitive politi 
cians are once again left with an implicit caveat that “only the experts can 
really understand ” On the other hand, reading the popularized reports on 
the exercise results, which provide neither an overall sense of the levels of 
functional literacy nor specific data that can lead to educational changes is 
muc hke flipping through a pile of unrelated curiosities — a form of Ripley’s 
Bel.d'e It or Not,” in which all the surprises have been edited out 

ne more reservation must be expressed about the more popularized re 
ports o the Assessment findings Again, ironically, it seems that if they have 
any major potential whatever, it is likely to be their potential for misinterpre 
10n e more generalized the level of statements about the results, the 
ore likely their meaning is to be misconstrued or even purposefully misrep- 
resented This issue was considered briefly m Chapter 6, later in this chapter 
T ® come clear that concern on this count is grounded in actual events 
is s ared by some leaders at NAEP The general superfluity of the 
ore popular reports means that NAEP’s utility must rely heavily on the 
more detailed reports that NAEP has published Yet to date there is little 
evidence that these reports contain findings that are both new and sufficient 
J , * C t0 s P ec, f lc use to researchers, curriculum specialists, or teach 

th r? C f S *, 0na ^ & rou P s w °rking in specific subject matter areas Moreover, 
reports have not supplied enough data to provide the overall 
ption o a census of levels of educational achievement, much less a gross 
or o t e society s level of educational achievement Finally, there has 
evirfpn t j 1 |, era ^ ^ mterest ,n reports than was initially expected, as 
n it y numerous in house memoranda attempting to figure out how to 

NAFP C ^ UC * ntCrest * n ort * er to increase the potential utility of the results, 
a its reporting format between the first and second years of the 

Assessment and described the change as follows 

^ Car ^ asscssmcn t (Science, Citizenship and 
fomS S F^ VC I**" rc P° rtcd in "»«* ™ght be called a “phase’ 
for all thp ^ ase \olumc treated a different aspect of the results 
within thr J^f rC1SCS a 8 ,vcn subject area For example, 

uonal r^l^ r? Sc,cncc ’ the P hasc 1 re PO" S*'* the na 
reoresentrH OI | 3 ] thc Saencc seroses, the phase IIA report 
represented results for regions of the country', the sexes and sizes 



REPORTING THE ASSESSMENT RESULTS 


143 


of community, and the phase IIB report presented results for col 
or, levels of parental education, and sizes and types of commu 
nity A major disadvantage of this format is that it is difficult for 
our readers to follow the results for specific exercises throug t e 
various phases 

Because nc feel that National Assessment results should be east 
ly understood and both meaningful and useful to educators and 
other concerned persons, we have adopted an alternative re P° 
ing format with the following qualities (1) the data are presen 
in a larger number of relat, vely small volumes so that the 1 reader 
will not be overwhelmed by the sheer sue of a report, (2 each 
volume contains exercises which cluster together ,n a waythatis 
meaningful to educators and scholars ,n the relevant ubject are a, 
and (3) along with each exercise the reports give all data 
tional and group — relevant to that exercise mean ,neful to 

We believe that the clusters of exercises nos * * j> ^ ^ 

educators and scholars are what we ca em ^ require di 
exercises which share a common idea u m y the year 02 

verse behavioral responses For each su J ldes a n in for 

assessment, there is a summary '° ,ume area /, nc luding the 
mation specifically relevant to t e su J ral tren ds that 

themes and objectives for the are ) ^ tQ each theme in 

appear m the data A separate volu t hm the theme are 

which the data for the 
given along with a summary o 

While it is apparent that the gaL will 

have been significant it is not a oge m , he ,, nt attempt to do it 

be with the thematic reporting forma erases are clustered is neces 

the development of the themes aroun w ic disproportionate measure 

sarily past hoc, presenting problems of valid' y J dls advantages and 

ment Staff members a. NAEP are presendy debating ^ ^ ^ 
advantages of this approach In ^ c „„ nl „g the reporting of 

had to deal with two other major po i bJocks to t heir use The fust q u 
results, both of which have . e „rc,ses and data that NAEP 

lion involves the percentage o 1 e popular documents The secon 

releases ,n its detailed reports and m . a „d in.eresn and .he 
pertains to the society’s shifting results without attribuling 

difficulties of NAEP s altemp .. pameu.ar levels o perfonn 

salny to background ^actore « ^ _ the area *-*■ 

ance Obviously, N 


t Central rofomoM 


, 02 Giy, PP 



144 


A STUDY OF THE NA TIONAL ASSESSMENT 


DECIDING WHICH RESULTS TO DISCLOSE 


NAEP has expended a great deal of energy in deliberating, designing, and 
implementing an exercise release system, but there is strong evidence that 
much of this energy has been spent unnecessarily because the basic premises 
of a partial release system were not fully analyzed by those responsible The 
policy for the Assessment’s first round was that one third to one half of the 
exercises and their related results would be released m the detailed NAEP 
reports Given the fact that the first round results — in Science, Citizenship 
and Writing -were received with a mixture of disinterest and criticism be 
cause they did not provide enough information within any particular subject 
area or in terms of particular objectives and subobjectives, NAEP altered its 
policies in the second round so that “at least 50 percent but less than 100 
percent of the round two exercises in Reading and Literature would be 
released 

The procedures for selecting the exercises to be reported are quite complex 
Their basic aspects are described as follows 


The primary purpose for developing a selection procedure was to 
insure that, although exercises would be selected randomly, they 
would nonetheless be representative of the total pool of exercises 
available With such a procedure we can be reasonably certain 
at a report provides coverage across objectives and across all 
popu ation group differences to the extent that such coverage ex 
ists in the entire pool of exercises assessed 

t is critical in a report that includes only a portion of the 
exercises assessed that we have exercises which represent the en 
ire spectrum of data For example, there should be exercises for 
W f . ma ^ muc h better than females, exercises which show 
tl° f * er ® nc ^ between males and females, and exercises for which 
e ema es do much better than the males Our selection proce 
Cna 63 us to achieve this kind of representative coverage for 
each group, , e , males, females, NE, SE, etc 
fill Jr* 1 com P u ter randomly selects exercises for reporting, we 
. a cx 5 r S a P s remain by looking for exercises which will pro- 
f 5 W1 an example of the type of data which is not in the set 
that rC,SCS . to b® released For example, if none of the exercises 
__ nr ^ C ^ ecte< * f° r release represent a large female advantage, 
i_ rcT _ f ° r f cx proses are selected specifically because they show a 
® f c ad vantage This systematic selection proved neccs 

HC >Car 02 Rcad,n S report but not for Literature 
^^rT?| S ? S1Cm j ,IC Sclcct, °n of 10 to 15 percent of the exercises, 
e to identify a set of exercises which are truly represen* 



REPORTING THE ASSESSMENT RESULTS 


145 


tame of the subject area being reported In year 02 the selected 
exercises amounted to about 50% of all exercises 4 Since the exer 
cises not selected for release during the first assessment cycle are 
v> ithheld to be released during the second assessment cycle t ose 
exercises designated for immediate release and those withhe or 
later release should be equivalent in two ways First ot sets o 
exercises must be equivalent in their coverage 0 ® J c lv ^ 
themes exercise formats and/or any other relevant c aracer 
tics Second both sets of exercises must be statistically equiv 
i e they must have similar representation across t e e P 
tram of percentages of success Th.s latter requirement prevent 
currently reporting for example that girls can rea repo rt 

than boys (on the basts of the released “«««) “ d * h “ T" 
tng five years hence that boys can rea 1 char samc con 

the basis of the unreleased or withheld exercise ) 
sideration applies to all reporting categories j Ass CSS 

In order to tnsure the necessary ^“’'l^Tgroup the 
ment selects exercises for release £ cs (objective theme 

exercises by their nonstatistical we t to 

format etc) Then within each .of P an “ nd ex which re 

achieve statistical equivalence by P sets 0 f s , m ilar 

fleets group differences and can be used 
exercises i 

These partial release plans have 

including restricted report coverage ex otatl0 n above nonequiva 

ties and despite the efforts mentioned in H ^ ^ ^ d]sa|SS these 
lence of the exercise pools between an wi aIrcady expressed some m 

shortcomings here partly because N ~ atBe thcre ,s a more fundamental 
honse awareness of them and partly b®"* 

issue which decreases their importance of rtia l release plan was based 
The original decision to design some form o P ^ [ha , Ka chens 

on security concerns Various educational wou ,d lead to the 

throughout the nation would teach ^ ^ NAEP set up ** 
creation of a national curriculum , m portant to the 

release plan as a response .0 these „ ould invalidate the A*« 

was NAEP s own alarm ,o the exercises ^cmis^ey 

ment— if the nation s students kn s usefulness as 

had been taught the exerases the A 

s General Informal o 



146 


A STUDY OF THE NATIONAL ASSESSMENT 


knowledge would be undermined Among other problems, over time the 
oo s within the NAEP sample might do better and become unrepresenta- 
tive of the nation 

Under the NAEP system, however, it is absurd for any particular teacher 
to teac to the test Given the sample design and the packaging system, 
only a small number of students within any given school is tested at any of the 
three ,n school age levels A teacher has no idea of which of his or her stu 
nts wi e tested and, therefore, would have to prepare them all Further 
more, teachers would have to prepare all these students to answer several 
i r r< t exerclses ln fbc su hject areas to be assessed, since they will not know 
, , whlch exerc,s “ will be ,n the specific packages to be admims 

ered And, finally, the results of student responses would not be known any 
way to either the local teacher, the child's parents, or the local school adm.n- 
distnct ' 311 W ° U ^ not be re P orte d nationally by identifying the local 

There has been much continumg debate at NAEP about alternative excr 
n i I t:iS< ' P ° 'y'’ wblcb would ease if not eliminate several of the existing 
, mS ' mC | U lna< ^ c< f uate re port coverage, exercise obsolescence, and 
be dealt* n fu r hc problcm of nonequivalence between cycles must still 
a v. Ut 11 566013 certain that a combination of selection from past 
tween th ^ ' j SC °" ° eW exerclses c °uld provide a sensible balance be 

oerren , °' com P arabll ‘ty a "tl flexibility One aspect of the 100 

flrrerW e eaSC optlon re quinng further study is whether 100 percent of the 
exercises rC,S “ Sh ° Uld be relea « d or whether 100 percent ol the Mm pool o\ 
most suh’.ert " "" S “ nused < -' Xcrcls '= without results, should be released In 
number P001 exerc,s ' 3 available is larger than the actual 

which exe * * essm ' nt . although more selective judgments about 

In anvlT ? ~ d “« this su£L sharply 

released^ ih 1 no 1 ""^ 1 ” " 0t Cntlrc P 00 ' ° r Ju5t ,he a “ ei sed exercises are 

Ire «aW,sh P " Cem d ' !C, ° SUr ' W ° U,d ^ ° f »-*** a "f a ' help to states .ha, 
economic res of ““T"'"' *>““■» and wish ,o use Assessment exercises for 
NAEP exereis n , S i, and S ° m ' Ca5 ' s for anmparabihty At present the pool of 
the imnerferi ” 7 Sd ' Ct ,r ° m “ r ' la " vcl y “nail, especially when 

; many °' ' h ' ' x6ra « a te taken into account Also, some 
c.» foMheT mdCpCnd ' n, 'y P a y In g the NAEP contractor, ,o develop exer- 
fh" am eZ'i 'T™ " app6ars lha ' ,bc y are paying for exercises 

NAEP exercise! 1 m ° r ' tha " of man y °f lh c unreleased 

more stales deirl "7, “ lh ' 5 Pta 6 ’ 1 " might become quite widespread as 
°P their accountability systems, the slates could avail them 



REPORTING THE ASSESSMENT RESULTS 


147 


selves of substantial economics if the NAEP pool were fully disclosed Of 
course, there would be additional costs involved in NAEPs updating the 
pools handling the related clearinghouse functions and disseminating the 
exercises, but these might be considerably less than for each of the fifty states 
to establish its own exercise pool independently 

Another issue deserves attention here While the apprehension that teach 
ers will teach to the test appears to be completely unfounded the possi 1 ity 
remains that teachers might teach from the test This of course is true 
er 50 percent or 75 percent of all the exercises are released An it i y 
be even more the case if states decide to use the same exercises o 
extent Within each of the subject areas the objectives su 0 J 
exercises might to some degree become curricula for some oca sys 
states — perhaps even nationally, a>, hough is difficult to »f»« r 
this wol produce major changes from what 

America’s schools While some critics mig y mt0 new areas either 
would be deleterious because it might had the schoo ^ potion 
unwillingly or unwittingly other critics wou argu ^ Assessment wou ld 
that because of its least common denomin q a ff irm ing the impor 

become a drag on educational change by too strongly affirm g 
tance of what is now being done in the schoo s Assessment s sub 

It ,s these issues .ha. make the earlier discu^*' ^ ^ perfor 
ject matter objectives so important ou on {he state d goals of the 

mance on the actual practices ° 1 ® s the Assessment leaders and 

schools or on the normative goals as e i schools actual prac 

educational experts’ If they representative’ 0 « 

tices or stated goals which schoo s s o defining objectives is re a 

suggested in Chapter 4 the consensus mode body t0 develop a system 

lively vacuous is it possible for NAEP or a . t|VM which takes appropn 
of objectives or a process for defining focal b perhaps more ,m 

ate account of educational cultural „ viable exp*®-" 

portan. if a few NAEP leaders do no. a „ , he consequences of 

of some imagined unitary American vah* th „ .he 

their taking a 

-t d r,:rnd "ru" iy as specific things that s.udenls s 
cises can really wit m a FP sinter 

laugh, or know how raus , ta made B* ™ , hc(W ,n 

Two “ lcl ; t,0 “ lf i ,'a J0 nty of ,1s leadership really W*> e are 

nal point of view ^ „ ,hat the objectives a 

eation of subjec 



148 


A STUDY OF THE NA TJONAL ASSESSMENT 


enved from a valid process, and that the exercises are good measures of the 
jectives, then it should support their dissemination without worrying about 
whether or not some teachers and school systems will teach from the test The 
q estion o ow many exercises should be released has little, if any, negative 
earing on t is judgment, because some may teach from 50 percent of the 
w ij SeS ’ W , 1C k produce a more undesirable slant in curricula than 

MtaIT ° f 311 the assessed ““Cl MS or the entire pool Stated differ- 
. really has faith that what it measures are desirable education 

f ° e ments, vvhat would be the harm if students actually learned to 

h t WC j° n measures7> Thus, in a roundabout way the Assessment’s 

of ^ 311 exerciscs C0ldd in the long run become one of the many causes 
from ncnV UCatl0na * ac bievement which the Assessment measures, but 
t t ' S P° ,nt °* view, at least, this would not thwart its primary ef 

forts-to measure and improve educational progress 

Unfortun fmaI remar ^ 1S necess ary on the 100 percent disclosure plan 

with the } C . y | glVC " thC inadec l uacies of many of the exercises, their loose fit 
general h* **** subob J ectlves > and the limited amount of hypothesis 
doubtful ft, t U X 31 C ° meS ° ut of tbe sampling and packaging system, it is 
“ S <he orLeJd exeLestll realty 

especially tn. ' "I*' reSults for scnous educational researchers This is 

ble as desenh' 7* u ' rcleased anti nnreleased exercises are quite compare- 
This comnar 1 /!! ^ qUOtat,on above fr °m the General Information Yearbook 
up tn the relea ' 'Y SUgSeStS tha< most of tJ le inadequacies that have shown 
proportions a ^ zeroises are likely to show up in approximately the same 
^ ne!™7 h :”° ng ! h ? Unreka - d cxerc.es This resection ,s no" intended 
save NAEP ° * hC 10 ° pcrc<:nt rci ease, however, which would not only 
Other inadequacmsTsf 1 “'““T probI "“, but ' v °uM also help mitigate 
data that wnnlH . C ^ art,a re ^ ea se system, and might provide extra 

accountability system^and curricula 65 ^ SCh °° 1S dCVC, ° P b ° ,h 


HPORTING the results- description versus 
EXPLANATION and prediction 

The issue of whether Nafp* 

natory and predictive K, k ! rCportln & “Agones are descriptive or expla- 
Chapter 6, Tyler and T l ^ ,ncreas,n Sly problematic. As discussed in 
of educational arh ° ^ saw l he Assessment as measunng national levels 

performance, and low Krio'J!^ P ° C ' t " S ° ’ " c ' ,lcn “' avOT S c 

Po rmance \V hile they expected to use various mea- 



REPORTING THE ASSESSMENT RESULTS 


suremcnt categories to find out it here these pockets were, they did not exp 
to use these same categories to explain ithy the pockets were where they were 

or to predict where they would be in the future , 

Despite T) let’s and Tuhcj’s earl) and continuing humility m this regal- 
and despite clear technical and political limits on the utility o exp a 
analyses, various important developments in recent years have 
ed these inherenti) haz) distinctions , was 

As of 1970, NAEP s position on the issue of interpretation 
ambivalent, but candidly so 

It has long been apparent that Nal,0 " a * ^^"oetaded volumi 
multiple reports, of differing types an m a g, ven year 

nous reports of ev ery exercise selectee ° r P Natlona i Assessment 
must be prepared These will be e thcmse | V es to immedi 
reports Such reports, however, o hoDefully many different 
ate or “obvious conclusions Someon , results in a fashion 

“someones ’ must pore over an s ° r . 1S j atte r step is one that 
that is most meaningful Whether 0 or w f, e ther it should be 

National Assessment itself should a “ c P f t0 th e school board 
left to the scientist to the class) room ' fcd that National As 

member is a moot question om p & j eav wg all interpre a 
sessment should provide informano Y ^ $ornC attempt a 
tion up to the user of the results Others^ ^ ^ hypothcses h t 
interpretation is necessary, « 0 ^ the nS k of gross mismtcp 

are tenable unless one wishes to ru As js often the case 

tations This issue has not been settled y 

middle ground may wel = , thl; above s.a.ement The 

There are .wo particularly imp-*"' ^ ®° crpr e t a„ons .ha. 
first is that one can conceive of many bc made along <1* 

from the NAEP results no. jus. ‘“^.crion The 

sion of description versus nL .*»• luses 

NAEP, as in so many other ins , hroug h middle groun 

hut expects «o resolve rhem ade q ua,e.y 

and often fails to do so knows its reports ° made 


37 38 



150 


A STUDY OF THE NA TIONAL ASSESSMENT 


measurement of outcomes since the Assessment was conceived in 1963 The 
effect of the several major surveys of educational achievement and inequality 
(as mentioned in Chapter 6) has been to make most of NAEP’s findings quife 
unsurprising If released between 1963 and 1965, many of NAEP's subse 
quent findings comparing performance between and among groups would 
have generated great excitement, but after Coleman and a large variety of 
impressionistic books on educational inequality, NAEPs major 
, enle nt indings had become relatively common knowledge (Though 
not true of the Citizenship results and of various process oriented exer 
axes m the other subject areas, NAEP has not seen fit to invite interpretauons 
y major sociologists and political scientists, perhaps partly because the valid 
■ty of so many of the exercises is questionable ) 

differ ' '!?' CVt . n wtlen NAE P does come up with findings that are somewhat 
tile NAFP^™ * "* ° f Wel * lcnown stu dies, there is general agreement that 
broad , ta S °^ 1 to accomplish anything but perhaps raise some new 
cie ln ,| ,! <ln \ ” example of this problem appears in an interpretive am 

cle in the March 1972 issue of American Education. 

I h oL^hme da,a Sh T black ■» the inner city, and 

SSinv u„ Pa , r if ntS thc , ' ast tdua <" all tending toward 
fetching up with national performance in science as they increase 

eem°f r black n,ne Y** r had a median of 14 5 per 
the nation correct than the median for nine year olds in 

sho Z* " Z black *«■«“» year olds turned in a 

S fll percent below national performance 
achievement 'iZ j bc rCSuks nauonally standardized 
find this startline ThTn? 1 b> i b,g ' cl,y schocl ^ems are likely to 
testing is one g ,| tyP lc al pattern from national achievement 
other students I™* enmg ^ a P s between the progress of black and 
those who five " 8 KO " om '°tHy dcpZrf cty areas and 
D^fth, m0rc ^ av0 cable circumstances 
these black stud^t^ ^ !" ^ ^ at,ona * Assessment data mean 

believed 9 ” a ° d othcrs are leading more than previously 

asked 't^L^m^n? 21101 !! 3 dlstm S u,shed scientists who were 
statistical orocerf ° n * rend suggested that the sampling and 

mad^fe ^ bL" "7,1 " S Na, '° nal a " 

Havichum nmf , drawin S definite conclusions Robert! 
noted lor^i^rtL 0 ' » the University of Chicago 

the tests for each C ' M tkcrc 13 no way of telling for sure whether 

age sesentemf rns** 86 F rOU *v^ VCrC ^“ahy ddficuh In addition by 
age seventeen many less able students ma) have dropped out of 



REPORTING THE ASSESSMENT RESULTS 


151 


school, and Assessment officials admittedly have been less success- 
ful than they had hoped in including dropouts in the seventeen- 
year-olds’ sample. Therefore the improvement of black seventeen- 
year-olds could be merely a reflection of their higher dropout rate 
Havighurst and others concluded, however, that it is pro a y 
safe to say that from ages nine to thirteen the performance o 
these groups docs not decline. . 

E\en if this evidence is only suggestive, it is still con rary 
expectations set by national achievement test data 

This is a simple illustration of the general problem we considered in P 
ter 6 in the discussion of whether NAEP should begin measuring , 

data NAEP continues to face the problem. Weaknesses in the e uca lon ^ 
hate helped to diminish interest m interpreting the Assessments ac te \jA£p a nd 
concomitantly they have increased the extent to which emphasis is P ace //(W 

others * die Assessment's findings with regard ft satgrenp, mP or- 

the extent to which the distinction between description an exp a ^ are for education 

tint Instead of the emphasis being on what the Assessment s im *** peJ j omance of 
and particular subject areas, it has largely focuse on 

subgroups and why they perform differently . complicate* these 

Still another factor concerning historica con ex T ^ lTl ^ genetic dif- 

developments The re-emergence of the classic e ^ achievement has 

ferences and their effects on intelligence an e^ reportin g catego- 

sigmficantly altered the societal context in w J ^ fhe Assessment’s re- 
nes are interpreted As Keppel assumed at > ^ ^ (Q re cogni- 

porting by racial background might have e educational or socia 

tion of inequality of opportunity and *>“" jn J , h|s saroc framework to- 
reforms, while some clearly interpret the res ^ ^ roany who would 
day, the social and political climate is an d would argue t a 

interpret low black performance as an unsllccess lul education o poo 

schools cannot be held responsible fo ^ 

black children „ i m , n its milder forms y CI . 

NAEP entem .his IoI , owing one from .he supenusm 

outside interpretations sue 
pal of Pocantico H.Us, New York 


— — 

’Hope Justus, F Marchl9 7Z 

cation, vol 8, no , P 



152 


A STUD Y OF THE NA T/ONAL ASSESSMJ 


Someo ne a.lced ,( on the bas , s o( |hls „ was ( 
whether Johnny can read and I’d say the answer K yes a 
" ( n y ls Sus ' e a " d she llv “ m the Northeast and she co 
me h“ t. ^ and her P arcnts are well-educated, th< 
* f, She pr l obabl >' 030 read If Johnny ,s a he, if 
»mewhete the vast Southeast and he’s Black and ht 
Am-, ^ 3n ^ su ^ ers ^ rom the disadvantages th 

wTd P T" y Md 1,1 health — he probably c 
ve;y well despite whatever he has go.ng for htm 

thmk th« ay bejU T P ' ng t0 “"clustons but I say that I 
Ind Vra . Pa ,7 °' tHe Pr ° blem » revealed tn the 
more thaTs V S ' > -^ ha, "’ S m ° re 'han jus, the sch 
problem n 'a' P ^ teacb ent and teaching of reading I th 
pfatL mhe Jr?' " “<** that we’re putting our d 
to look im , ba " places we might put them I think 
£rWds a„d .“T thoSe wh ° ««*d, '"to the, 
^en^four h ‘ u he u' ndS ° f problems 'hey deal «, 
2“! , ; t ay basis ’ days a week and th 

been clamonn^? 0 " £ r ° V ' d<:S US mth ,b e information et 

po^nlrwZ probl - » now, t, 

directly'toward '‘ meS i'^ , "" :rpre ' at,on of N AEP results mm 
1 r nCeS ’ ° r s ' at '™nls eemforcmg .1 
I hI , U ‘ 1,!t,C POSmons because of the IQ del 

The Hope Justus article quoted earlier contains these 1, 

-am^e^Tt a" P CT B ^ “ a «-»«* mdica 
exhibit a curiositv Ind Wack c h‘'dren than n 

around them It r,,I n ‘Inrst'oning attitude about tl 
mnh 1,11k or no d,Jf„ mcr a, ^'S ! omclm, s „f quest, omng co! 
non black thirteen year-oM^ C<! i a peranI ages of bl 
a result of natural,-,' glra,fa have long 
equal in h^a^” 1 -' 11 "™'. both groups show , 

[Emphasised }> “’"t™" b ' B °' m,S,nf " 

Wh "' -J^thes uggesnon .ha, co, or per se makes a d„„ 

lecturer, Unnersny^f'^'^^J" 1 ’'? "'" 5 P mm P al Poeamico I 

— *«*■* «• * p f j^u.rto 'r" d “ ra,m ■“ d -> ■ 

Ju«us op at , p 7 



REPORTING THE ASSESSMENT RESULTS 


153 


exercise results, Justus goes on to offer these patronizing (and perhaps mah 
ctous, in the case of the giraffe example) items of evidence about occasional 
instances when blacks do equally well or better Though the article ends with 
the presentation of an environmentalist position that suggests much might be 
done to enhance the educational achie\ement of blacks, the section just pre 
ceding the end summarizes the “limitations” of blacks, and also provides a 
gross contrast with affluent whites 


With regard to the kinds of exercises in which various S™ u P s 
perform best and poorest, the results established thee eare P 
terns for blacks and .f fluent suburban youngsters They show tna 
blacks generally do best on exercises dependent on ai 1 ^ ar j 

5E E?« s,rap " 

P For example^about 96 percent o< 

nine-year-olds can do an exercise that requ balance And 
one weight by hanging a second ' weight on a ^ knQW that 
again, the percentage of black thirteen y about the same 

teeth are brushed to keep them from decay g 
as that in other groups that require use 

Black performance lags the most on e ^ sophisticated 

and interpretation of complicated data, wor ds or facts 

equipment, and knowledge of difficult s bstrac tions proves 
remote from daily life That mampu atl ° . , m t heir diffenng 

particularly difficult for blacks shows up Exerases intend 

performances under the four science o je an d abilities 

ed to find out whether students P ossc “ ience we re precisely the 
necessary to engage in the processes o s , n - 
ones on which they displayed their P°° thirteen year olds ** 

For example, only half as many blac : ^ a grap h and 

thirteen year olds m other classificatio ^ of a dog In 
tabular data to determine the dai y nt suburban sc 

trast, youngsters attending the comparison W1 ‘ . 

tended to make their strongest showing relatively * 

rest of the nation in just this area PP^ knoW ledge pro 
stract principles and the use o 
learned in the classroom 

NAEP has been .ry,ng to deal w.th th '^, am „ But "’^ ht 4e 
extent, and to take steps to minimize mi ^ has att emp tecf 
predicted in the 1970 statement quoted ear ie — ' 



154 AS TUD Y OF THE NA TIONA L A SSESSMENT 

problem of description versus explanation by finding some middle ground 
The result to date has been a muddled policy 

At the very beginning of the detailed report on Science Results as de 
scribed by sex, region, and size of community, NAEP undertakes to clarify its 
position 

There is a kind of interpretation that should never be made on 
the basis of the sort of figures given in this report The fact that 
figures reflect Southeast performance or Big City performance 
does not mean that the performances thus reflected have arisen 
precisely from living m the Southeast or in a Big City, or from the 
attitudes, techniques, facilities and staffs of the school system in 
votved 

In particular, just what happens m a region involves other 
things than that region’s schools Larger fractions of the children 
in some regions belong to a particular size-of community group 
Thus effects due only to size of community can appear to be 
regional differences Larger or smaller fractions of the parents in 
some regions have particular amounts of education Thus effects 
due only to parental education can appear to be regional differ 
ences And so on Migration from one region or size of community 
to another can further complicate the picture There are such 
difficulties some of which we know how to adjust for, and some of 
which we do not 

It is important for us to distinguish between an interest in caus 
es and an interest in what the present situation is To guide read 
ers who want to think about causes, numbers usually have to be 
found by looking at (or considering) combinations of several clas 
sifications Such numbers will be more appropriate for thinking 
about causes than the sort of number given in the present report, 
although they may still be far from perfect (Such numbers will be 
given m a later report ) To guide readers who want to compare 
today’s situations, as they stand, say region versus region or one 
size of community \ersus another, it is appropriate to look at 
regions separately or at sizes of communities separately, which has 
been done in this present report 'This distinction between causes 
and present situations is important. Readers should be careful to 
understand it and then keep it in mind n 

Here NAEP asks the reader to be especially alert to the distinction between 
description and the understanding of causes Yet at the same time NAEP 
itself continues to misunderstand or misrepresent that distinction It makes 


1 1 Report 4, 1969-1970 Assessment (Denver ECS 1 972) 




REPORTING THE ASSESSMENT RESULTS 


155 


the unnecessary point that total causality should never be derived from a 
single classification The example used — that region by itself cannot be taken 
as the cause of differences in educational achievement — is depressmgly ob 
vious and oversimplified But more seriously, the overall presentation and the 
example encourage loose inferences that most classifications or combinations 
of classifications below the regional level do have simple and direct and 
easily discernible— cause and effect relations with educational achievement, 
for instance, the sentence “Thus effects due only to size of community can appear 
to be regional differences ” (Emphasis added ) It is startling that NAEP 
would suggest that it knows, or will soon know, to what extent educational 
achievement effects are iut only to community size No research group has 
developed data or analysis techniques sophisticated enough really to answer 


To make matters even more confusing, the statement of the Repot sta 
tations then goes on to say that one really cannot expect to Jearn too much 
about causes ,n terms of single class,!, canons when compared wnh wha n 
can learn from combinations of several classifications ^“=1 that the 

information a forthcoming report 

bU T dlZ might seem unnecessarily 

necessary NAEP’s statements 1 , and *1^!^ ^ « 
at hand, are qmte slgmfican m 197]> decided to try to r,de the 

ment quoted above, it is clear th j j, simp l y states that 

thm edge of this political and ' "^describing the present situation is 
understanding causes is one th g ^ ^ do both 

another, but then ,t goes on to imp y ., em , or NAEP and a year 

Clearly this dilemma has been an > F 02 G ,y, the state 

later, in the Introduction to the mm what N AEP 

men; of limitations was largely rewritten to come 

is actually capable of doing H|hef abm e or 

When the data show dnitagroup ^ „ eras e gr at ooutmm 
below the nation as a who! d dl |t er ences National As 

attributing causation to t e reasons for d jJJ ere " C ,f ’ tor s 

sessment is not i"“ ndcd 'd.ifcTences ,f they exist Many lacmia 

purpose is to describe such di eren accept able responses to 

may affect an individual’s *1^ Cons , d er, for example a 



156 


A STUD Y OF THE NA T10NAL ASSESSMENT 


have excellent physical facilities and high quality faculties, belong 
to high socio economic families, have parents with a high level of 
education, come from homes with many reading materials and so 
on All these factors could contnbute to the group’s high level of 
achievement while membership in the group itself may contnbute 
very little or nothing 

The name of a group is merely a categoncal label When we 
look at the data for a given group, therefore, we cannot say that 
any difference in achievement between that group and the nation 
as a whole is attributable solely to membership in that group In 
other words, a group must not be construed as necessanly being 
the cause or even being a cause for the comparatively high or low 
achievement of that group as compared to the nation as a whole 
Often a disproportionately large percentage of the members of 
a group of interest are also members of particular groups defined 
by other factors All these factors may contnbute to the group’s 
high (or low) level of achievement The data obtained from these 
groups do not allow one to evaluate the effectiveness of the educa 
tional process on these groups apart from the advantageous (or 
debilitating) factors A statistical procedure called balancing ad 
justs for the disproportionate distribution of group members in 
other categones or groups for which there are adequate data 
available This procedure gives the achievement data for the 
group in question that would have been obtained had the mem 
bers of the group been distnbuted proportionately across these 
other categones or groups National Assessment data, balanced 
for disproportionate representation, are presented in a special re 
search volume Again great caution must be exercised in interpret 
mg the balanced data The balanced data still reflect many extra 
neous factors not assessed by National Assessment and therefore, 
are still not “pure” measures of the impact of membership in the 
group in question 'Even with balanced data, a group must not be 
construed as necessanly being the cause or even as being a cause 
for the differential achievement between that group and the na 
tion as a whole 12 

In this case, NAEP clearly declares its intentions and capacities ‘ National 
Assessment is not intended to provide reasons for differences its purpose is to 
describe such differences if they exist ” It then describes its plans to balance 
the data, to control particular vanables, and to report performance levels for 
combined classifications that are underrepresented in the sample So far, the 
intent is still clear, as is the fact that if NAEP has interesting results to report 


li General Information Yearbook, 02 GIY, pp v-vi 



REPORTING THE ASSESSMENT RESULTS 


157 


the balancing and combining of classifications will help to describe relatively 
precisely the relationships among the classifications and the exercise results 
But then, once again, NAEP slips back into extended remarks which blur ns 
own distinction between description and explanation In Chapter XII of the 
General Infernal, on Yearbook, 02-GIY, NAEP describes the balancing system in 
more detail 


THE MERITS AND WEAKNESSES OF ADJUSTMENT 
(INCLUDING BALANCING) 

The educational administrator wants to make comparisons^ ^ 
tween groups, to find out who is learning more a groups 

hopes of being able to improve performance m the lagg SS J 
Indeed, he would like to go further and tad °uM. *Jt 
change and how much changes ; in these ed For cxam 

the educational achievement of the studen J duct|ve 
pie, when we find that boys know t j ie question 

system of both sexes than do S ,rl J» * ta bly the desire is to 

of strengthening the education s o M to make 

subdivide the country into ,mer '' J^verything alike 1 ex 
comparisons between subgroups that h 

cept the variable being studied Hilferences Unlortu 

In other words, we search lor causes ol : '^™™ ohkms a „d 
nately, we cannot have ever >'* ln | d 50 wc arc not actually able 
rarely in physical problems either and ^ ^ ^ b( . bettcr 
to carry out the precise programs program that 

— and make 

comparisons in performance among E P Juce several vana 
One thing that happens is that as ^ dl , ct For exam 

bles the number of subdivisions gr ? a „ d 4 categories re 

pie, ,f we have 5 variables with 4, , 840 subgroups, and a 

soectively, we have 2 X 3 X 5 X 7 X 4 * on , y p„ 

sample of 8,400 people would give be S cmpt y and many 

subgroup Naturally many subg P ,f not impossible, to mak 

fuller than 10, but it will will contain too few peo 

comparisons among subgroups lor 

• . by only I°°k in S 

P We might try to avoid these sparse cells y 

,aC Se°v n e e r, a ch. l“ - - -—jf "0""^“^ 

Sre“ = ,n parenla. edition ^ ^ between ihese 

It is natural to ask, 



158 


A STUDY OF THE NATIONAL ASSESSMENT 


extreme types of community have been if the distribution of pa- 
rental education, sex, color and region had been the same for both 
types of community referred to above 7 ” Were it possible to re- 
arrange the world to equate these distributions for each type of 
community, the effects upon our nation and its schools would be 
profound Such rearrangement is not possible It is usually appro 
priate to think of such balanced results as reflecting the differ 
ences we would see in the absence of masquerading by the other 
four factors We can be reasonably sure the balanced results do a 
much better job than the unadjusted results of reflecting such 
differences 

Still another question concerns the combination of factors The 
performance of a given group may be found to differ, depending 
upon subgroupings on other variables Thus, the effect associated 
with extreme affluent suburbs may be different m the Northeast 
and the Southeast Or the effect associated with sex may be some 
what different for Blacks and Whites Such interactive differences 
may be of importance, balancing does not adjust for them 

It is natural to ask whether this or any such method of analysis 
can help us To some extent they can aid, to some extent not We 
cannot make up for cases we don’t have but we may be able to 
supply approximate analyses that will come near to answering 
such a question as what is the effect of region of the country on 
performance when you control for size and type of community 
and several other variables If the effect of region is substantially 
reduced by the analytical adjustment, we may be inclined to 
think that region is not in itself the cause of the raw differences as 
much as the other variables One role of adjustment then is to 
help us make approximate comparisons and summaries that we 
cannot make by directly subdividing all the variables 

Elsewhere (see Foreword and chapter 8) we have many cau 
tionary remarks about the dangers of misinterpreting the caus 
atwe powers of given background variables, for they may be poor- 
ly measured and they may not mean what they say For an 
example, from the field of warfare, in World War II the more 
fighter opposition bombers had, the closer to the target were the 
bombs Why’ Fighters didn’t come up when the weather obscured 
the target Such proxy vanables, especially when their correct 
interpretation may be the absolute reverse of their obvious effect 
puts us in gras e danger of making mistakes We do not go further 
into that here 

Nothing but ex penmen tauon, if that, can sene to demonstrate 
what the actual effect of changes will be We are, however, trying 
to get hints and insights from the data we have Furthermore, if 
someone docs have a causative model involving the vanables Na 
tional Assessment measures, he does have a chance to check it 
against these results. 



REPORTING THE ASSESSMENT RESULTS 

to hdnX'rfmn’ro! ^ ° l ^ ana ' ysis and ad J“ttme„t is 

P th ' dat f r f VMl ’"formation that they cannot give ,n their 
' orTn ^ sld ' from the dangers ot misinterpretation, we have 
the pchtrca! arguments for and agamst adjustment First, against 
djustmem for background variables seems to reduce the differ- 
ences between a group of the population and the national aver- 
age, it has been argued that this tends to minimize the d.sadvan 
tage of the group and, it is further argued, that adjustment should 
not be made The direction of the effect of an adjustment is not 
necessarily one-way, adjustments can increase differences as well 
as decrease them Those arguing agamst adjustment in the reduc 
tion case woufd presumably argue for it in the case of increased 
discrepancies 

A second argument favors adjustment It argues that we must 
adjust for important variables (presuming that the adjustment 
will reduce effects) so that we show the potential of the disadvan 
taged group 

Clearly the people making the first and second argument want 
the same thing, to improve the position of the disadvantaged 
group, and of course, this is a national goal Steps toward achiev 
mg such goals do depend on searching for causes and methods of 
improvement, on finding weak spots m a system and so on We 
should, therefore, look at our data in every way we can for hints 
about how the system works and how to improve it Analysis and 
adjustments are tools for doing this The question is not whether 
to adjust or not, but, “What are the useful ways 3 and ‘ What do 
the variables mean 3 ” “What further variables do we need to mea 
sure 3 ” and “How shall we interpret the results 3 ’ 

NAEP’s philosophy of “half a loaf may be belter than none ’ has no innate 
virtues to recommend it over a more direct and realistic statement that 
NAEP’s purpose and its only really feasible potential — conceptually, eco- 
nomically, and politically— is that of description Balancing can be employed Jot 
the sake of more sophisticated description, but again it should not be confused as in the 
above quoted paragraphs with balancing for the sake of causal analysis 

NAEP seems unable to say— in an era of great interest in causal relation 
ships between inputs and outputs— that it does not have the data or the 
techniques necessary to accomplish conclusive and fruitful uork m this area 
Here we refer back to the observation of the Gilbert and MosteJler memoran 
dum in Chapter 6 ft ts cleat that NAEP ts very unlikely to make pathbreal 
tng progress in terms of identifying roanipulable variables (especia y sc oo 
variables) that w.II lead to precise policy implications for either the Men. 
government, state departments of education, ot local school *«« »« 
Ly for the Assessment to move toward fulfilling .Is own object and 



160 


A STUDY OF THE NATIONAL ASSESSMENT 


society’s expectations is by making those objectives and expectations much 
more patently realistic. Recognizing that an admission of more circumscribed 
utility might jeopardize federal support for the Assessment, it still seems ob- 
vious that NAEP’s energy spent trying to do things it cannot do is really 
energy wasted, and that sooner or later society, and Congress in particular, is 
sure to understand what the Assessment can and cannot produce. NAEP 
itself should be among the first to recognize its own limitations, clarify its real 
potential, and concentrate its energies on fulfilling it. This and other related 
questions on the Assessment’s utility are examined in more detail in the next 
chapter. 



9 


CONCLUSION: 

PAST and FUTURE 
USES OF THE ASSESSMENT 


There are many ways to ask the question, “What has the 
National Assessment accomplished?” One might inquire as to what uses can 
be made of its results Are the results worth the more than $25 million that has 
been spent to obtain them’ In coming years will theyjustify federal expendi 
tures of several million dollars per year* 1 In our judgment these and other 
important questions about what the Assessment has accomplished can be 
answered judiciously and gainfully only by taking a broader view We must 
return to the typology of the Assessment’s early objectives, as presented in 
Chapter 2, and consider which of these objectives have been achieved rela 
tively successfully and which have not In addition, we must consider any 
new objectives that have been added to NAEP’s definition of its purposes 
Given the extensive evidence m Chapters 3 through 8 regarding the 
Assessment’s many serious conceptual, technical, and procedural deficiencies, 
our consideration of a broader view of what has been accomplished might 
seem to be little more than an exercise in courtesy But it is not Trying to 
understand the Assessment’s worth and its achievements requires taking into 
account all of its objectives In the long run, even if the National Assessment 
ceases to exist, the attainment of some of its less publicized objectives may 
exercise major influence on American education, regardless of ns perfor- 
mance on its more widely known objectives 

Below we review the original major objectives, m the reverse order of their 

161 



162 


A STUDY OF THE NATIONAL ASSESSMENT 


presentation m Chapter 2 Reversing the order seems helpful because it al- 
lows us to move upwards from the most operational to the most emphasized 
and major objectives 

E Operational Objectives 

12 To create an independent committee to manage the de 
velopment of the Assessment 

13 To develop widespread acceptance among the educa- 
tional establishment so that the Assessment could gam 
access to school systems 

14 To develop widespread political support so that the fed 
eral government would take over the funding of the As 
sessment, while at the same time assuring that represen- 
tatives of state and local governments would not be too 
uneasy about the project being federally funded 

15 To develop lists of educational objectives that would 
“fairly reflect the aims of American education” and 
serve as guides for the exercise writers 

As we have seen, the Assessment succeeded in achieving the first three of 
these important operational goals Given the complexities of educational 
politics, attainment of objectives 13 and 14 is no minor feat The expenditure 
of time, energy, and money involved was quite substantial, but the resulting 
success was essential to pursuit of any of the other objectives 

The detailed analysis of the Assessment's subject matter objectives in 
Chapter 4 brings attention to serious shortcomings in the attempt to fulfill 
objective 15, when it is broadly defined But if this objective js more narrowly 
construed, some significant successes are apparent Considering the broader 
definition first, the consensual style ol developing the subject matter objec 
tives may appear to “fairly reflect the aims of American education,” but, as 
we have seen, in general the objectives reflect merely the least common 
denominator objectives that most people can agree on, to the neglect of any 
of the deeply fell aims held by various individuals and groups When viewed m 
a narrower sense, however, objective 15 has produced some unexpected con 
sequences of considerable positive significance While most of the subject 
matter objectives are quite conventional, those in the fields of Reading, Mu 
sic, and Career and Occupational Development represent substantial im 
provements which could have propitious effects on educational planning at 
the local, state, and national levels Though some of the original Assessment 
leaders viewed the delineation of subject matter objectives merely as a device 
to help the exercise writers, in a few instances the objectives have turned out 
to be useful in their own right As we show later, the dissemination of NAEP s 



PAST AND FUTURE USES OF THE ASSESSMENT ,63 


sta,e de “ and — *-« 

NAEP leaders ^ NAEP and ° f ' tse1 '- at lcast b Y 


^ Major Low Profile Objectives 

9 To lead a movement away from relying solely on norm 
referenced testing which discriminates among tndmdu 
als and toward some form of objective or criterion ref 
erenced tests that assess how much an individual or 
group actually knows about a particular area of knowl 
edge 

10 To lead a movement away from current testing which 
relies largely on measuring knowledge in ways that ov 
eremphasize memorization and that underemphasize 
actual skills understandings and attitudes 

11 To encourage new modes of testing that are better fitted 
to the kinds of information being gathered and the par 
ticular characteristics of the respondents 

The evidence in Chapter 5 indicates that NAEP employed an extremely 
narrow definition of criterion referencing that it overlooked the implications 
of the basic theoretical concepts underlying criterion referencing and that it 
did not really produce criterion referenced exercises of any significance Simi 
larly, though with certain interesting exceptions NAEP was generally unable 
to develop valtd and reliable process exercises that could advance the mea 
surement of skills, understandings, and attitudes 

But substantial evidence shows that mixed with these procedural failures, 
NAEP s interest and ambitions in these directions — and its development of 
some successful new modes of testing — have exerted a potentially very sigmfi 
cant impact on educational testing in general In other words while NAEP 
did not have the capacity to advance these new fields it did help to convince 
many groups of the current limitations of relying so heavily on norm refer 
enced tests and narrowly defined cognitive measures Interest is now wide 
spread in finding more constructive and comprehensive measures of individu 
al educational development It is hkely, however, that serious advances in this 
held will flow from small and stringently defined projects intended to accom 
plish these specific purposes rather than from a more unwieldy and less 
economical operation such as the National Assessment Of course, developing 
the measures without developing extensive interest in employing them would 
be unproductive, and so the fact that NAEP has encouraged such interest is 

of major note 

An additional low profile objective of the National Assessment has been to 



164 


A STUDY OF THE HA TIONAL ASSESSMENT 


evaluate the performance of young adults (twenty-six to thirty-five), a group 
that has never been tested in so comprehensive a way before NAEP has 
succeeded in sampling and reaching this group, although as yet the data or 
the interpretations of it have not yielded especially interesting findings Fur- 
thermore, NAEP has not taken advantage of this unusual opportunity to 
raise fundamental questions about how much of what is taught in school is 
forgotten by ages twenty-sis to thirty-five If retention of school-taught facts is 
quite low, perhaps the most basic premises of formal schooling should be 
seriously reconsidered 

C Subordinate Objectives 

7 To promote concern about more meaningfully defining 
the nation’s educational objectives 

8 To provide comparative data to stimulate competition 
among the states and local communities (without encour- 
aging invidious comparisons) 

In considering objective 7, it is important to note again that the 
Assessment’s objective booklets are attracting sizable interest from states and 
local districts that are planning assessments, from curriculum developers, and 
from some lay groups On first examination, given the trend toward widely 
diffused and disparate uses of these objectives, and the Assessment’s relatively 
facile consensual model of defining national educational objectives, it appears 
that NAEP has failed with regard to objective 7, and in fact, if one evaluates 
this in the terms on which NAEP expected to achieve it, by meaningfully 
defining national educational objectives, one must conclude that these efforts 
have failed 

But since the United States has no unified national school system, the task 
of defining nationwide educational objectives is manifestly difficult It in- 
volves highly complex educauonal, cultural, and political questions ^V r ho, if 
anyone, has the nght or capacity to define national objectives’ Should the 
objectives include only what actually is done in the schools, or should the} be 
the stated goals of the schools, or should they be advanced normative objec- 
tives which some experts decide ought to be the goals of the nation’s schools’ 
The answer to this last query is crucial since different answers raise still other 
important issues If the objectives are linked to the existing practice or goals 
of the schools, they have conservative tendencies, providing national sanction 
of the present By contrast, if they are well ahead of existing practice and 
existing goals, students in man} school systems are likely to perform especial 
ly poorl) and it ma} well appear that state and local control is being eroded. 



PAST AND FUTURE USES OF THE ASSESSMENT 


165 


even though some may believe that the new objectives are crucial if educa 
tion is to become meaningful Further how can the development of national 
subject matter objectives be reconciled with the development of diverse learn 
mg environments or any newly developing respect for educational and cul 
tural pluralism? 

Most of the proposals for determining national education objectives are 
basically centralized and consensual m their nature For instance John 
Good lad suggests ‘ VVe need a national body of leading citizens whose pnma 
ry purpose is to give continued attention to the formulation of educational 
aims 1 

Philip Smith has proposed a similarly centralized group but in this case 
comprised of leading professionals 

the problem of objectives lor American schooling is a prob 
lem calling (or highly competent professional resolution 1 he 
determination ol objectives that will give curricular force and 
other operational meaning to the central purpose in the years 
ahead calls for sophisticated theoretical technical and admims 
trative decisions 2 

This second proposal ,s closer to the quas, participatory model wh.ch the 
National Assessment used letting contractors establish lists of objective ' “ 
then letting schoolpeople and prom.nen. lay people review them Ar«d™ 
onstrate in the neat chapter however NAEP s consensual s y 
conflicting educational objectives has its d ' , “'" e a ' rd atta , nm cnt 

to guide policy decisions which in turn are meant 

° f There^ema'ins the subtle possibility nonetheless 
ure in this regard could lead to major progri =» We ^t » 
but for the moment we should note t at t e exp governance 

nation s educational goals could in ..self uhima.ely abet forms g 
and encourage more meaningful determinatio^ comparative data 

Judging NAEP s performance on obj«..ve P part.cu 

tostimulafe competition among ,he i-*^ _ ^ g „ , hat NAEPs 


larly exacting Our opinion as 


■John Goodlad SM Curing™ ■»*""** S "'” ^ ^ ^ 



166 


A STUD Y OF THE HA TIONAL ASSESSMENT 


primary backers did not have strong or consistent convictions that the Na 
tional Assessment should collect data by states and local communities When 
viewed in this way, one can maintain that NAEP has not failed at objective 
8, instead, the evidence is that NAEP has played a fairly major role in 
encouraging states to set up assessment systems for comparing local school 
districts The NAEP sampling model, the objectives booklets, some of the 
exercises, and some of the reporting techniques are being adopted (and 
adapted) by various states, and by some local districts 

If one views objective 8 in a more normative sense, though, it can be 
argued that NAEP’s long run utility to the nation in general, and to the 
Congress and USOE in particular, is severely restricted by its failure to col 
led uniform comparative data from all the states and a variety of representa 
tive local districts But as suggested earlier, this interpretation too can be 
turned around to imply that having regional data might in some ways give 
the federal government an advantage by allowing it to develop goals and 
programs that need not be administered through either existing state depart 
ments of education or local school districts 

B Major Long Term Objectives 

4 To continue collecting the national data at regular inter 
vals so that comparisons could be made over time con 
ceming national levels of achievement, jierformance m 
various subject areas, and performance in various sub* 
groups, vis a vis themselves and other subgroups This 
would provide a census of educational progress in Amen 
ca 

5 To forestall the development of “less effective or misdi 
rected” attempts at assessment Some backers of the As- 
sessment disapproved of plans for a California assessment 
and the national proposals of Admiral Hyman Rickover, 
both of which were less interested in reducing inequality 
than in separating elites from the average population so 
“excellence” could be pursued efficiently 

6 To make international compansons possible once sam 
plmg and testing problems could be resolved 

Objective 5, “to forestall the development of ‘less effective or misdirected 
attempts at assessment,” is also difficult, and is directly related to the fore- 
going discussion concerning objective 8 NAEP has certainly failed on objec 
tive 5 il one assumes the intent was to eliminate, or almost completely chmi 
nate, ‘ less effective or misdirected ’ attempts at assessment Despite the 
National Assessment’s many grave deficiencies, some of the state and local 



PAST AND future uses of 


THE ASSESSMENT 


IHIT be ’" S ‘ ,eVC, ° Ped ^ d “ r ' y ,nf '™ r *h=r conception, destgn, 

from the ETSr tIOn !" c”’ l( °"' maJ ° r *' 0t assessmeI » recommendations 
com the ETS Center for Statewide Educational Assessment is any indication 
the situation is likely to get worse in the future 5 
If objective 5 was to eliminate such inferior models of assessment, then one 
might assume that the resolve in objective 8 was to collect uniform data for 
states and many local communities It appears from the transcripts and refat 
e memoranda that NAEP’s leaders may not have noticed the importance of 
the relationship between these two objectives 
On the other hand, if one takes objective 5 more literally, and more nomi 
na,, y> the intent was merely to forestall inferior and misdirected assessments 
In this case, if NAEP’s model, materials, or methods are better than someone 
else s, then NAEP is succeeding every lime a state or locality adopts some 
aspect of the National Assessment program Of course, this argument may be 
spurious Given transference of scattered parts of the whole, maladaptation 
might be just as likely as functional adaptation 

Objective 6 — to make international comparisons possible — cannot be 
judged because there has been no real attempt to achieve it It is worth 
noting, however, that the IEA studies of educational achievement being con 
ducted in twenty-two countries would seem to eliminate the need for America 
to prescribe the development of an international system 
It is also too soon to comment upon the achievement of objective 4 — the 
development of a long term census of educational progress in America 1/ the 
Assessment has any major potential for the future it is clearly in this area But 
important questions remain to be asked about what the probable utility of 
such a census would be, especially given the present low returns and high 
costs After the evidence presented in Chapters 3 through 8 it is difficult to 
support even the potential utility of such a census unless the present 
Assessment’s conception and methods are radically revised At this point we 
can return to the question, ‘ What uses can be made of the Assessment re 
suits'* ’ 

We have seen that the Assessment has accomplished, or partly accom 
plished, several of its objectives. But now in considering its overall utility. 
must distinguish between these objectors and m three Major Shari Trim Obj'C 
tats Some of the objectives that have been achieved were largely operation 


3 Nancy L Bruno, Paul B Campbell, 
Assessment Methods and Concerns (Princeton, NJ 


and William H Schabackcr.Stoteuide 
Educational Testing Service 1972) 



168 


A STUD Y OF THE NA TIONAL ASSESSMENT 


al — including setting up the administrative organization and gaming wide 
spread public and political support lor establishing the Assessment These 
could be regarded as major objectives only until the time they were achieved 
Other objectives that have been partly attained — including altering concepts 
of testing, producing some advanced subject-matter objectives, and facilitat- 
ing the development of state and local assessments — are effective largely 
through indirect influence, that is, rather than producing specific results, the 
Assessment has exercised an influence generally on various components of 
American education In each of these instances, the Assessment may have 
been quite important but must be rated only as one factor among many 


NAEP’S MOST CRITICAL OBJECTIVES 

In the long run, neither the operational objectives nor the influential objec 
lives merit current levels of public support ($6 million per year) for the 
Assessment While it clearly deserves credit for progressing toward its objec 
lives, ultimately the Assessment will have to perform well on its three major 
short term objectives, redefine them substantially, or find some other major 
objectives to pursue The original and continuing major commitments of the 
National Assessment are 

A Major Short-Term Objectives 

1 To obtain meaningful national data on the strengths and 
weaknesses of American education (by locating deficien- 
cies and inequalities in particular subject areas and par- 
ticular subgroups of the population) 

1 To rwovvie. tb.vs. data to Congress., dvt lay public, and 
educational decision makers so that they could make 
more informed decisions on new programs, bond issues, 
new curricula, steps to reduce inequalities, and so on 
3 To provide this data to researchers working on various 
educational problems concerning teaching and learning, 
either to answer some research questions or to identify 
specific problems which would generate research hy- 
potheses 

No attempt will be made here to restate the detailed evidence of Chapters 
3 through 8 which indicates that to date the Assessment is generally failing in 
all three of its major short-term objectives We summarize by noting some of 
the many factors responsible for this failure, each of which has been consid 
cred in detail earlier They include the limitations of the selection of subject 



PAST AND FUTURE USES OF THE ASSESSMENT 


J69 


areas, the delineation of subject matter objectives and subobjectives, the de 
velopment of exercises, the measurement of descriptive background charac- 
teristics, and the combined use of the matrix sampling, the packaging system, 


and short testing times More generally, the results have been adversely af 
fected by the technical limits of social science, the politics of federal state 
local educational relations, economic constraints time constraints, the partic- 
ular intentions of the Assessment’s original leaders, poor management plan 
mng, mediocre service by many of the major contracting groups, and by the 
appearance of the Coleman Report and several subsequent reports detailing 
major aspects of educational achievement, documenting inequalities, and ad 
vancmg standards for hard evidence beyond the standards employed by the 
Assessment Finally, beyond all these, there are additional defects inherent in 


the way in which the results are reported 

These limitations leave the Assessment with (1) virtually no capacity to 
answer research questions and even very little capacity to generate signi icant 
research hypotheses that could not have been generated more precisely and 
less expensively by smaller stud.es, (2) virtually no capacity to provide Mhe 
federal government, the lay pubhe, o, most edueanona poheynuk ns *h 
results tha, are d.reetly useful for decs, on making and (3) mos su pr: ng 
all, virtually no really stgmheam and supportable new findings w„h regard 
to the strengths and weaknesses ol Amencan education 


NAEP’S RESPONSES TO THE LIMITED UTILITY OF 
ITS RESULTS 

NAEP has become keenly aware of the hunted !0m / 0 | lh e 

results To date, beyond ' '“ed six major responses to the 

problems mentioned earlier NAbr 

matter of lack of utility . on the importance of the indirect 

First, it has begun to put more emp chapter Second it has be 

ly influential objectives considered cal ” aboM wh at 11 expects 10 

gun to publish somewhat mom qua ^ ^ ^ abominating ns subjee. 
accomplish with its results ’ „ s unto themscKcs which 

matter objectives as .1 they were signihcmt ^ 

some cases they are Fourth be used Filth. « h« 

,o explain how .he Assessment »n.y a benchmark and ,h« 
asked lor patience, nlmnung ^ „ p „r ihree cycles 

interesting findings and changes w 



170 


A STUDY OF THE NATIONAL ASSESSMENT 


have been completed And finally, it has created a Department of Apphca 
tions and begun to pursue some new objectives The second fourth, and last 
of these responses are discussed in greater detail below 

A variety of articles by NAEP staff members and friends of NAEP have 
begun to be more humble about what can be expected of the Assessment. 
Without explicitly abandoning the objectives of providing directly useful data 
to policymakers and researchers, attempts are being made to reduce the ex 
pectations for the Assessment without reducing support for it To date, such 
efforts contain numerous contradictions In the October 1971 issue of Phi 
Della Kappan the former staff director of NAEP and the then assistant to the 
director for exercise development coauthored the following disclaimer, in an 
article ironicall) titled ‘ How Will National Assessment Change American 
Education^ ’ 

A recurring concern both among those who support national as- 
sessment and those who have reservations about it is the ultimate 
utility of the results How will they affect education in this coun 
try 7 This is a very difficult question While national assessment is 
designed to provide general information it is not designed to pro- 
duce answers to specific educational questions 

Certainly the originators of national assessment expected the 
project to contribute to improved educational decision making 
Certainly they felt that better answers can be produced by deci 
sion makers if they have more mformauon Their thesis was that 
someone needed to begin systematically, to gather information 
(not answers) about educational outcomes in this country 

These originators were men and women of sufficient vision to 
see innumerable possibiliUes for the use of assessment type infor 
mation for legislators faced with hard decisions e g whether to 
allocate extra monies for instruction in reading if it means refus- 
ing requests for other educational needs for school board mem 
bers faced with the quesuon of how to deal with educational needs 
of disadvantaged groups or minority groups for curriculum spe 
Clalists and teachers faced with decisions of how best to allocate 
class time to educational materials related to specific goals 
But national assessment was designed to be just one informa 
tion gathering project to fill one information void Well-designed 
state assessments local school district assessments and special re 
search studies seeking answers to specific educational questions 
would all be necessary to complete the picture 4 


4 Frank B V» omtr and Maxjone M Mastie Pht DtLa kaffen vol 53 no 2 

p 118 



PAST AND FUTURE USES OF THE ASSESSMENT 


171 


Despite these judicious remarks, later m the very same article we find the 
authors suggesting various debatable ways in which members of different 
interest groups might make direct use of the Assessment results Numerous 
articles of this sort have been published by NAEP between 1969 and the 
present In general, they seem counterproductive because they unwittingly 
point up how commonplace the findings are, how imprecise they are, how 
indirectly they relate to any particular constituencies or needs, or how the 
same kinds of information could be obtained more readily usefully, and 


economically in other ways 

Sometimes the suggested uses for the results are incredibly ummagma 
tive — or, perhaps, quite imaginative, but also incredible The o owing ap 
peared in a special issue of COMPACT magazine devoted entirely to the 
National Assessment Published ,n Februaiy 1972, the article rom which this 
excerpt comes appears side by side with others, some of which ower 
Sflh raise expeclanom for ,be Assessment ^ 
entitled ■'Indus, ty-An Unnoticed Comumer - It begin, by c Imming The 
importance of National As— to ^ "1 

lishers, teachers and olher educators is obvious It the g 
the implications of the Assessment for American industry 

Industry will react favorably. for exampk, m conudermg^asses! 

ment results in terms of plan, loca non for an ,n 

When an industry considers a .cert > 81 th( . cduca ,,onaI 

dustnal development, it wl ^ Important to any industry 
competence of the people th f make up 0 [ a comma 

would be information as to the eou from apenmce and 

mty schooling, general 

whether members of that commun ^ but from National As 
self education after miration be oblamed’ 

sessment results can this p Wlll want to know the levels 

Industries planning to hire y ’ ernp ioyees One of the 

of knowledge and skills of _thew F*> efopment of conttnu 

unportan. industrial prob oms now ___ pUnE To p a n 

mg education programs for p P programs an industry 

adequate and useful — and can do 
must first find out , wha ‘ e ”„ P n / emp Ioyees m from another -part o 
If the industry plans to bring P> they will be will 

ssrieSosaJSSS 



172 


A STUDY OF THE NATIONAL ASSESSMENT 


as to the nature of school systems in different geographic regions 
plus information on the overall make-up of communities 5 

In fact, the National Assessment can provide none of the specific data sug 
gested in these passages 

At the same time that NAEP has stepped up its efforts to think of possible 
uses for the Assessment results, it has also taken steps to find other uses for 
itself and thus to assure continuing widespread support In October 1971, a 
Department of Utilization and Applications was formed within NAEP 
While this department overlapped with the existing NAEP Public Relations 
Department to some extent, its top priorities were quite different While there 
was to be continuing interest in disseminating Assessment results, this new 
department’s main purpose was to begin disseminating the Assessment model, rather 
than the results 

The department budget for the first year was approximately SI 50,000, and 
for the second year approximately S450,000 The chief functions include run- 
ning workshops to inform states and local school districts of how they might 
make use of the Assessment models, either in terms of the processes for de 
termming subject matter objectives or in terms of using NAEP exercises, or 
the NAEP sampling, packaging, or reporting systems, and so on The depart 
ment works directly with individual state offices of education and also with 
local school districts 

This change is important for two reasons From our interviews at NAEP it 
seemed apparent that the new emphasis on disseminating the model instead 
of the results would draw pnonttes and resources away from the really sub- 
stantive revisions which are necessary to make the Assessment's results more 
useful in the long run Second., it seemed clear that there were strong political 
implications m the creation of the new department and the emphasis being 
given to it In part it was established because USOE was under pressure from 
Congress to indicate how the Assessment would be useful, and USOE, in 
turn, applied that pressure to NAEP The departmental change gave NAEP 
a chance to deemphasize its shortcomings and to start building a new set of 
constituents who could then testify as to its utility before USOE and Con 
gress In fact, the Assessment’s model and materials may well be more useful 
at the state or local level than at the national level But since NAEP and ECS 
are controlling this process, there is great emphasis on allowing the states 
leeway to develop their own assessments, which are bound to vary considera 
bly in quality just as they are bound to collect noncomparable data 


5 John K. Wolfe, COMPACT, February 1972, p 7 




PAST AND FUTURE USES OF THE ASSESSMENT 


173 


RECOMMENDATIONS. THE FUTURE OF THE 
NATIONAL ASSESSMENT OF EDUCATIONAL 
PROGRESS 


What should be done with the National Assessment of Educational Pro- 
gress? It has accomplished its original operational goals and has been one 
major source of influence on various aspects of American education But it is 
generally failing in its three major objectives the collection of meaningful data 
on the strengths and weaknesses of American education, the provision of 
directly usable information to decision makers including politicians, educa 
tors, and lay people, and the solution ol research problems or at least the 
specific generation of important research hypotheses 

(1) We recommend strong but constructive skepticism regarding the 
Assessment's future One proposal for action, coming from NAEP s leaders is 
the same as ,t has been since the fall of 1963 Everyone should wan patiently 
because the "fust cycle is only the benchmark and we really can t know w 
results we’ll have until the second or third cycle , 

Among the numerous difficult, es with waning longer is ihe high cost^ 
volved If ,t takes three cycles to know whether anyth, ng in e e - 

turned up, and the cycles are ftve years each, a, an average annual budge, 
over $6 million, some «90 null, on will be spent waiting « eJ(am 

But leaving cost aside for the moment, the p ea or pa i ie 
ined more closely m ns own right I, is easy to recognise than * ™ ^ ^ 
think through everything the Assessment migj pr ^ ^ argumcnt ,hat 
came up in 1963 but difficult to ^ bc the comparisons over 

the first cycle is “just a benchmar , he Assessment's work involves 

time that will be really interesting ' ’ m len subjtct areas, at three 

testing four age groups (including young ’ knowledge, skills under 

different levels of difficulty, in the lour ?*£££ , nd fde Cuenship 
standings, and attitudes F “ r,h '""°^ rcer „ J Occupational Development 
and a broadly defined area ca , dom lcstc( j across such a specmim o 

both very interesting areas which a 

the population . Assessment potentially constitutes one 

Our impression nf all this » Tess ever undertaken by a nation 

of .he most significant acts of self co ^ broad , y defined ,he its 

Though many of ns aspccls m gh status roor e ™mprehensn eb 

sesame* even now covers «ae£ ■ ^ s „ ms to „s ,ha. .he firs, 

than any other such venture Becau ^ ^ ^ aboul lh c sociel, 
round might be very " C ‘"^ that most of the chaoses over 

Concomitantly, we feel relative y 



172 A STUD Y OF THE NA TJONAL ASSESSMENT 

as to the nature of school systems in different geographic regions 

plus information on the overall make-up of communities 5 

In fact, the National Assessment can provide none of the specific data sug 
gested in these passages 

At the same time that NAEP has stepped up its efforts to think of possible 
uses for the Assessment results, it has also taken steps to find other uses for 
itself and thus to assure continuing widespread support In October 1971, a 
Department of Utilization and Applications was formed within NAEP 
While this department overlapped with the existing NAEP Public Relations 
Department to some extent, its top priorities were quite different While there 
was to be continuing interest in disseminating Assessment results, this new 
department’s mam purpose was to begin disseminating the Assessment model, rather 
than the results 

The department budget for the first year was approximately $150,000, and 
for the second year approximately $450,000 The chief functions include run- 
ning workshops to inform states and local school districts of how they might 
make use of the Assessment models, either in terms of the processes for de- 
termining subject-matter objectives or in terms of using NAEP exercises, or 
the NAEP sampling, packaging, or reporting systems, and so on The depart- 
ment works directly with individual state offices of education and also with 
local school districts 

This change is important for two reasons From our interviews at NAEP it 
seemed apparent that the new emphasis on disseminating the model instead 
of the results would draw priorities and resources away from the really sub 
stantive revisions which are necessary to make the Assessment’s results more 
useful in the long, run Second., it seemed clear that there were strong political 
implications in the creation of the new department and the emphasis being 
given to it In part it was established because USOE was under pressure from 
Congress to indicate how the Assessment would be useful, and USOE, in 
turn, applied that pressure to NAEP The departmental change gave NAEP 
a chance to deemphasize its shortcomings and to start building a new set of 
constituents who could then testify as to its utility before USOE and Con- 
gress In fact, the Assessment’s model and materials may well be more useful 
at the state or local level than at the national level But since NAEP and ECS 
are controlling this process, there is great emphasis on allowing the states 
leeway to develop their own assessments, which are bound to vary considera- 
bly in quality just as they are bound to collect noncomparablc data 


’John K. Wolfe, COMPACT, February 1972, p 7 




PAST AND FUTURE USES OF THE ASSESSMENT 


173 


RECOMMENDATIONS: THE FUTURE OF THE 
NATIONAL ASSESSMENT OF EDUCATIONAL 
PROGRESS 


What should be done with the National Assessment of Educational Pro 
gress'* It has accomplished its original operational goals and has been one 
major source of influence on various aspects of American education But it is 
generally failing in its three major objectives the collection of meaningful data 
on the strengths and weaknesses of American education, the provision of 
directly usable information to decision makers, including politicians, educa 
tors, and lay people, and the solution of research problems, or at least the 
specific generation of important research hypotheses 

(1) We recommend strong but constructive skepticism regarding t e 
Assessment's future One proposal for action, coming from NAEP s leaders is 
the same as it has been since the fall of 1963 Everyone should wait patiently 
because the "first cycle is only the benchmark and we really can t know w 
results we’ll have until the second or third cycle , 

Among the numerous difficulties with waiting longer ,s .he high cost 
volved If it takes three cycles to know whether anything 
turned up, and the cycles are five years each, a, an average annual budge, 
over S6 mtlhon, some S90 million will be spent waning to see 

But leaving cos. aside for the moment, the plea for P-—» .= 

ined more closely in its own right It is easy to recogmz ^ ]dea finl 

think through everything the Assessment mig t P ar ^ me „t that 

came up in 1963, but difficult to understand the origins of , the a ■ P ^ 
the first cycle is “just a benchmark ^ that ^ AxcssmtM s „ 0 rk involves 
time that will be really interes mg subiect areas, at three 

testing four age groups (.nd^in* ^Ttegone's of knowledge, skills unde, 
different levels of difhcu ty, subject areas include Citizenship 

standings, and attitudes ^upationa, Development 

and a broadly defined •ea^which are seldom tested across such a spectrum of 
both very interesting are 

the population B , hat lhc Assessment potentially 

Our impression o se ,j consc , outness ever undertaken by a 

of the most significant a ,, havc been more broadly defined I e 

Though many of its asp “ ‘ ‘ , s educational stains more eomprel , em , j 

sessment even now of thls , lt «„ ,0 us .ha, the 

than any other such v ^ tmns wh at „ tells about the soar ^ ^ 
lund might be very L lve |y certain that most of the chang 


Concomitantly we 



174 


A STUDY OF THE NATIONAL ASSESSMENT 


five year cycle will be small, only reinforcing the significance of the baseline 
data Furthermore, we would suggest that more attention be paid to the first 
cycle by sociologists, historians, and political scientists — looking at it as a 
measure of the overall literacy of the society, this type of analysis is essential, 
in part because the individual results are so difficult to interpret precisely 
Moreover, major changes in results from one cycle to the next are likely to do 
no more than stir debate as to whether the change is a result of schooling, or 
some societal factor, or some flaw in the testing situation We have looked 
through exercise after exercise, positing upward and downward changes and 
then asking, “So what*” 

In a way, it seems that at this point the proposal to wait patiently is partly 
a defense to cover up for the anxieties at NAEP about how the Assessment 
results can really be useful In fact, NAEP itself is not waiting patiently, quite 
to the contrary, it has been busy “cutting and pasting” in an effort to ehmi 
nate some of the most elementary deficiencies of the first testing Being pa 
tient may m the end be excellent advice, but in the meantime drastic changes 
must be made, both m the societal expectations for the Assessment and m its 
own sense of purpose 

(2) It should be made very clear that the Assessment is not a national 
“school accountability” system It cannot directly measure the outcomes of 
schooling, nor can it measure the effectiveness of any particular input of any 
school, curriculum, federal program, or state school system, etc If it could 
measure outcomes, causal factors, and effectiveness it would obviously be extremely 
useful for both research and policy purposes, but it cannot as we have dem 
onstrated in Chapter 6 

(3) It also should be made clear that the Assessment is not a short term 
-lesesncb t n decision making too’i VvetiTt convinced by Vhe evidence m Chap 
ters 3 through 8 that the National Assessment is not, and cannot economi 
cally be, an effective mechanism for the solution of educational research 
problems, for the direct generation of research hypotheses or for the provi 
sion of information that will be of significant and immediate use to educa 
tiona! policymakers or the lay public We have not been able to think of any 
educational research problems or major policy matters that could be most 
directly and precisely clarified through the mechanism of the National As 
sessment It is enlightening in this context to return for a moment to the 
original planning meeting of December 19, 1963 John Flanagan has just 
finished proposing that the Assessment should try to measure aspirations re 
garding the kinds of courses students want or whether they want to go to 
college 



PAST AND FUTURE USES OF THE ASSESSMENT 


175 


CHAIRMAN GARDNER Well, I think, John, those are extremely 
interesting items I think they’d be a little hard— or the kind of 
things you might want to try out on a smaller scale before you 
plunged in on a national sample, don’t you think? 

MR TUKEY Is there anything in this for which this isn’t true? 
(Laughter ) 

MR HOLLAND That’s my answer 1 (Laughter ) 6 


The matter of whether the Assessment is an appropriate instrument for the 
conduct of serious research or data gathering for policymakers has become 
quite grave since that first meeting During our second session of interviews at 
NAEP headquarters in Denver, in August 1972, a representative of the Na 
tional Center for Educational Statistics (NCEST), the monitoring office with 
in USOE, was no less than startled to find that Tyler confirmed our limited 
view of what the Assessment could accomplish in the short term, while 
NAEP’s administrative officials, James Hazlett and Stanley Ahmann, re 
mained silent on the issue 

At some point in the near future the Assessment may have to be more 
candid about these research and policy limitations, although we recognize 
that an admission that even with a lot more work the Assessment is likely to 
become a useful long term census and little more may jeopardize funding 
The risks are high either way, but we believe that a careful lowering o 
expectations, coupled with a great deal of timely excitement about how 
America is mature enough to be able to afford a longer term attempt 
educational self consciousness might leave the • census like Assessment 


^ If NAEP is unable to back away from us commitments lo generate specific 
data for research hypotheses, the most direct way in which to increase^ 
utility in this regard may be to expand the testing periods and the par go 
individuals can be tested for meaningful relationships amo g 
so hat sllbjcct areas, across di/fieul.y levels and across the 

performances in ddfere ^ and be 

categories of knowdedg.,^ ^ ^ ^ would undou h,edly be 

emp asiz , „ t he context of a more specialized effort 

pu „„ed more «onom.=2 ^ ^ ^ ^ , mpl of .he 

(4) ex , priorities toward disseminating the Assessment m 

n^AEP^s to'become a federally subsidized clearinghouse to help slates and 

6 Proceedings of the National Testing Project Conference December 18-19 1963 
Carnegie files p 13° 



10 


EPILOGUE: 

SOCIAL INDICATORS AND 
the REFORM OF EDUCATION 


Cast out from the Eden of understanding, the human 
quest has been for a common tongue and a unity of 
knowledge, for a set of “first principles” which, in the 
epistemology of learning, would underlie the modes of 
experience and the categories of reason and so shape a set 
of invariant truths The library of Babel [Jorge Luk 
B orges] mocks this hubns like endless spice, ft is all there 
" nl '- ? d ' '' kc Godd ' s 'hiorem, knowing" 

is a contradiction makes it not a contradiction In the end 

Sheldon and 

National Assessmen^o"^™ " ,W Sa ' h "' d ^ ' h ' 

t,onal matters alone bu, must consider the dleb 3nd 0p " 3 ' 

f , unsiaertne development of social indicators as 

r T enl ; n b^m r " C, " ral Am " 1<:3 " - ^uemly obutned 

R b^The creat° T"' 1 "'" 31 »«)'■ characterised, ,n Darnel 

Bell s analysis, by creation ol a service economy, the preemmence of the 


178 


SOCIAL INDICA TORS AND THE REFORM OF EDUCA T70N 1 79 


professional and technical class, the centrality of theoretical knowledge as the 
source of societal innovation and policy formulation, and the evolution of a 
new intellectual technology Over 50 percent of the United States labor force 
is now employed m the service sector (which includes the personal services, as 
well as trade, finance, transport, health, recreation, research education, and 
government), and services make up more than 50 percent of the national 

product 1 

Arguments for the development of social indicators usually rest on the 
theory that the transition to a service economy will create particular strains 
in the market system In Western industrial society, the market has long been 
the forum for the allocation of value, in which first the “invisible hand” and 
later national monetary and fiscal policy ensured that decisions by individual 
economic actors sustained the economy’s workings In the postindustrial soci 
ety, collective decision making increasingly may replace the market m 


allocation of value 

Microeconomic theory teaches that a perfectly competitive market eco 
my is optimal (in the Pareto sense) whenever private costs equa s 
and private benefits equal social benefits The economy wi ' r 
overproduce eertatn goods tf these conditions arc no, me, The. sutan 
tial agreement among welfare economists that, for education, heal h nans 
portaiton, and other soctal services, equably o private an ! °“ “ 
benefits ts no. achieved The social benefits of 

are possibly greater than the sum of ,he P rIV * 1 ® " C healt h_often called 
educated individual For services such as e uca to wj]] not a u 0 

collective goods— perfect competition in unreal ^ lndlvldual eco 
cate resources optimally The compel e made col 

nomic actors will differ subs, an., ally from decisions .ha. might 

Iectively by the population * ^ indlvldua | an d collective 

Social indicators are expected to imp _ ^ MB _for example .he 
decision making by illuminating so 


•See Daniel Bell, Th. Omm s of O ' "pZZlMn.l 

took,, 1973) and Robert L ^.TLling attach. * "C**»J 

- zszzzz 

lep or,.^r^".^PF 33-97 



180 


A STUDY OF THE NATIONAL ASSESSMENT 


functional literacy of the population, its health, the incidence of crime, or the 
level of pollution It often is imagined that social indicators might produce a 
balance sheet useful in clarifying policy choices or delineating performances 
in areas of defined social need 3 

A considerable range in the complexity of data gathering has been evi- 
denced in proposals for social indicators The simplest possibility of all is the 
one-time measurement of a single item — for example, the present birth rate, 
literacy level, or size of population Here, one simply seeks to discover the 
present value of some variable A somewhat more sophisticated exercise is the 
collection of time senes data For example, the U S census provides popula- 
tion trends for a two hundred year period A third possibility is the construc- 
tion of an accounting system m which the collected data forms an interrelat- 
ed network For example, the national income and product accounts indicate 
the total amount spent on final output by all participants in the economy, the 
total income earned by participants for productive services, and the sum of 
the output produced by all producing units 4 Finally, data can be collected 
which play an important role m a theory of some social process For example, 
the price index, unemployment rate, and investment, government, and con- 
sumption expenditures are all important elements of neo-Keynesian econom 
ic analysis 5 

Whatever data are collected, characteristics of large organizations inevita- 
bly influence the ways data are used First, it is far more likely that an 
organization will develop indicators of progress toward long standing goals 
than toward those recently established Second, indicators which are positive, 
l e , which show progress, are more likely to be used than those which are 
negative Third, those variables which are easy to measure or which are 
needed for administrative purposes iw 11 be obtained before those less observ- 
able or not so greatly required for administration Fourth, indicators seem to 


3 These particular objectives for social accounting were suggested by the 
Nauonal Commission on Technology, Automation and Economic Progress See Ray 
mond Bauer, ed , Social Indicators (Cambridge, Mass MIT, I9G6), pp xiv-xv, and 
Daniel Bell, “The Idea of a Social Report,” The Public Interest, no 15, Spring 1969, p 
78 

4 For a description of the national income and product accounts, see any 
good macroeconomics text One excellent presentation is available in Warren Smith 
Macroeconomics (Homewood, 111 Irwin, 1970), pp 23-89 

5 For a particularly elaborate proposal for a social accounting system, see 
Bertram M Gross, “The State of the Nation Social Systems Accounting,” tn Bauer, 
op cit , pp 154-272 




SOCIAL INDICA TORS AND THE REFORM OF EDUCA 770N 181 


manifest cultural lags, l e , indicators fail to keep abreast of the techniques of 
statistical measurement, new indicators are not developed to meet new needs 
for information, and indicators fail to change in the manner needed to reflect 
alterations in the phenomena for which they are an index Fifth indicators 
are most likely to be developed in a policy area in which there is a large 
interest group providing pressure This may or may not mean that such a 
policy area merits statistical concern or that areas for which there is little 
public notice necessarily merit small concern 6 

Consideration of organizational constraints may expand and improve the 
use of data For example, as Albert Biderman suggests paying attention to 
the way knowledge is used might help identify 


(1) Institutional, political and other barriers that are responsible 

for gaps in data about crucial social phenomena These findings 
may point toward ways of increasing the correspondence of aval 
able measures of aspects of the society with their perceived impor 
tance and desired state .o,,,,,,™ 

(2) Barriers lo the comprehension and use 0 / data in planning 

and administration , .. „_ orfinp 

(3) Sources ol distortion and btas in the recording and reporting 

of social data , 7 

(4) Neglected pertinences of data to social value 

Study of the way knowledge used might also call at.en.ton »rte nomcten 
tific roles data can play Biderman suggests tha, data can be employed 

(!) as the bases o. claims 

devices established by law or “ cdures 0 | mterorgamzational 
various parties to the adv '^J P ational alliances (4) as sym 

politics (3) as the cohesion S (5) „ ew grounds for 

bols for the persuasion of publics ana t I 
national and institutional creeds 

h have a strong influence on 
Organizational structure and P ra “‘“ which they are 

both the kinds of social statistics gat ere ]ecmc dcclslon making there 
used Any attempt to use social s ' a “““ on wlth relatively clear and disiinct 
fore would seem to require an o g 


,1 nnals in Bauer op cit pp 

6 Albert D Brfer ^ eBBe ,ta, ‘ 5tJ °' 
68 153 Biderman s essay includes a 

7 Ibid p 73 
« Ibid p 102 



182 


A STUDY OF THE NA TIONAL ASSESSMENT 


goals and practices — features not usually characteristic of education or other 
social services 9 

Further, if we assume that the primary (although perhaps not single) pur- 
pose of social data gathering is to provide empirical evidence for scientific 
analysis of public policy, then the logical position of such data within the 
social sciences must be made clear That is, if it is to be employed in the 
service of science, it must be collected in accordance with the imperatives of 
the science involved Data do not speak for themselves — use of theory and 
empirical evidence are closely linked Only when evidence is collected with a 
specific hypothesis in mind, in the hope of either supporting or refuting the 
hypothesis, can appropriate controls be developed to exclude irrelevant 
events which would otherwise mask the phenomena to be observed 10 

It is unlikely, then, that item-fy-item collection of general social statistics will provide 
information for rigorous hypothesis testing, unless the data is collected with a theoy in 
mind 

It is possible that generally collected data may provide some aid in the 
generation of hypotheses This may be particularly true if the data describe 
items which are quantified in ordinary usage (such as age, number, size, 
dollar value) But variables to be measured are constructs — they are not 
given by nature, but must be defined by the experimenter Hence, even with- 
in a seemingly well-defined policy area such as health care, there is no guar- 
antee that the variables chosen for a general census would be those useful in 
theory-based analysis 


9 For discussions of the broad aims of social action programs, sec Robert S 
Weiss and Martin Rein, “The Evaluation of Broad Aim Programs Experimental 
Design, Its Difficulties, and an Alternative,” Administrative Science Quarterly, no 15, p 
97, 1970, and Stephen Bailey and Edith Mosher, ESEA The Office of Education Admmis 
ters a Law (Syracuse Syracuse University Press, 1968) 

10 One can argue further that, in order for empirical evidence to be useful in 
human activity, it must be collected as part of an ongoing life practice of criticism and 
hypothesis testing For an analysis of this thread in Marxism, existentialism, pragma- 
tism, and analytic philosophy, see Richard J Bernstein, Praxis and Action. Contemporary 
Philosophies oj Human Activity (Philadelphia University of Pennsylvania Press, 1971) 
For the more narrow view that “data do not speak for themselves,” see Ernest Nagel, 
The Structure of Science (New York Harcourt, Brace & World, 1961) 

Science also can be undemood as an effort to interpret the world, to create symbol 
ic frameworks through which the world appears to “make sense " In this view, the 
methodological concern is not in arranging experimental controls for hypothesis test 
ing but is in understanding evidence within a broader conceptual structure See Peter 
Winch, The Idea of a Social Science and Its Relation to Philosophy (London Rout ledge, 
1958) 



SOCIAL INDICA TORS AND THE REFORM OF EDUCA TION 183 


It becomes apparent that certain problems are attendant upon the collec- 
tion of scientifically useful census data The essential point is that unless such 
data is gathered by a community which shares fundamental assumptions 
about the phenomena observed, it is unlikely that the data can have scientific 
utility. But in reference to any important public policy matter for which it is 
hoped such data will have value, there is surely no single community of 
assumptions 

Biderman lists six technical obstacles to developing an adequate and respected system of 
indicators invalidity, inaccuracy, conflicting indicators, lack of data, incompatible models, 
and value consensus 11 As we have suggested, these obstacles can be explained as 
consequences of the attempt to develop empirical evidence in the absence of 


shared theory 

The transition to a service economy may well bring, then, not a new socia 
balance sheet to supplement the GNP , but rather a heightened recognition 
fundamental conflicts and contradictions in social policy 'Hie i ea o a 
ance sheet for collective decision making rests on the possibility of an authen- 
tic consensus, which, at least in education, rarely exists . 

But conflict, beyond appearing inescapable, may actually 
tage Paul Feyerabend » has suggested lha. pluralism ,s 
advancement of knowledge I, u «•* » <*« «**"“““ ‘fZZmZ* 
rack one Held untA a tenaedy might also be 

and refuting facte a„ uncovered This P as „ notcd below) 

applied to social statistics (and to ftdd suc h as health or 

Rather than a single set of agreed upon in ■ ^ by a num ber of 

education, the federal government mig co ancJ vaIucs Presum- 

different communities, each having its own , he ^petmon 

ably, over time the competition between < theories (i« _ , wd and the 

for resources) would lead to the demise of certain indicators 

growth of others developed example of social 

It must be admitted that for the singl ^ ^ ^ th „ retic al frame, neo- 
statistics, the economic indicators, veness , n the economic mdica- 

Keynesian analysis There is no rea era ble debate concerning defim- 

*“ Neve«he^ .hcre . „ amp ,e, human as we, I as 

tions of “unemployment, 


ot,pp 281 



184 


A STUDY OF THE NA T10NAL ASSESSMENT 


physical capital), “national product,” and other elements of the economic 
data base One wonders whether, as the consensus on the meaning of these 
fundamental terms erodes, the scientific policy use of economic statistics will 
decline 


SOCIAL INDICATORS IN AMERICAN EDUCATION 

We can conclude from this discussion that the assumptions underlying an 
organizations’ s system oj social accounting will reflect the assumptions underlying the 
organtzattons’s understanding of its own actmty The scientific utility of a system of 
indicators is tn the most fundamental sense a function of the correctness of the 
organization’s understanding of its goals and structure 

An organization that would establish a social accounting system must first 
come to understand its aims and the way in which it is structured to attain 
them Second, it must devise a system of measures to ascertain its effectiveness 
in achieving its goals Finally, on the basis of its theoretical understanding of 
its structure, it can change policies in the service of realizing more effective 
performance The development of organizational consciousness, the first stage 
of social accounting, inevitably uncovers intrinsic contradictions in an 
organization’s structure Awareness of these may or may not lead to a new 
theoretical synthesis and the political action required for structural change 
and growth 

In considering the Assessment from this point of view, the organization in 
question is not the National Assessment of Educational Progress, but rather 
the entire American educational system Despite the long term trend toward 
more collective decision making, and despite the movement toward more 
decision making at the federal level, it is obvious that insofar as one can speak 
of American education as an organization at all, its understanding of its own 
goals and structure is highl) ambiguous Predictably then, the educational system 
has created a social accounting sjstem — the National Assessment of Educational Prog 
ress — tn its own fragmented and jumbled image And just as predictably, failing to grasp 
the salience of the dnerse practices and goals of American education, the National Assess 
menl has been plagued by all the dilemmas of invalidity, inaccuracy, conflicting indicators, 
lack of data, incompatible models, and continuing lack of value consensus While NAEP 
received most of these dilemmas as a direct legacy from the American educa 
tional system, it then made numerous decisions — as described throughout this 
stud) — which exacerbated them In addition, various factors considered ear 
her have placed NAEP prettj squarely in the tradition of other large institu 



SOCIAL INDICA TORS AND THE REFORM OF EDUCA TION 1 85 


tions collecting social statistics, and, as mentioned by Biderman above, this 
means overemphasis of long standing goals, overmeasurement of easily mea 
sured phenomena, and lagging behind the most sophisticated techniques of 
statistical measurement 

The Assessment’s style of resolving basic conflicts ensured that no consis 
tent view of American education would emerge, and at the same time, that 
a functional pluralism could not develop Rather than encouraging either 
one or several consistent views of the educational process the Assessment 
promised to stay as close as possible to present administrative practice in 
education (l e , knowledge vas by and large to be construed as (ailing within 
subject matter areas, progress would be measured by exercises and outcomes 
which “the schools are not trying to teach” would be neglected) In reality, as 
noted in Chapters 3 and 4 and elsewhere, the Assessment strayed from this 
promise, both wittingly and unwittingly, and the result was a stil more m 
eons, stent vtew of the goals of Amertcan cducatton Thrones of «gn.n 
development of some standing m the research community 

Piaget’s and Jerome Bruner’s work, were neglected party ecau . 

st.ll in the process of betng val, dated m expertmental muanom and p n y 
because no developmental theory of any kmd has ever 

ly by the United States educational establishment The a B , h>t 

nation, an inevitability given the varied tmst Furth ermore no 

theory is important only after it has enter P developing at 

effort^ was made to create a conststen. system of P 5 

leas, minimal, complexly deftned funettonal . co mmum,y 

As we have remarked, when soctal vacuus traps 

with a weak set of shared assumptions, e „ ramma tic use However 

and therefore have lt.de prectse theorettcal or used 

as was also men.toned earlier, this does ”^e used like any Chet 
at all Instead of betng employed setenttf, rally, they 

political resource fundamental issues is difficult to 

We have commented that a S r ^ em ^ n btaine d by the Assessment, facile 
obtain in American education and w* ^ftmpetmg values ruled us 
consensus, rather than -Its m the dese, 
process We must expect then tew 

opment of rational decision „ concclv able that sjstemat 

We might ask under wha, ‘ * gathered the Held of 

tcally usable new genera/ Aary will be published 


one hope is that a new 


s the spirit of the times One 


can only guess, of course, 


whether a tract in 



186 


A STUDY OF THE NA TIONAL ASSESSMENT 


Keyne^ ^ Wntte " ““ rdatlvd >’ c,ear P°>'cy implications of a 

Another hope, doubtlessly more likely and certainly more modest, follows 
he example of client oriented therapy more nearly than that of economics 
Individuals and organizations may come to realize that acts imply conse 
quences and that real choices are possible If this were the supposition of large 
numbers of people and organizations, information, especially greater self 
knowledge, would become a more valued resource ,n the process of growth 


EVALUATION: WHEN IS SELF-KNOWLEDGE ACTED 
UPON? 

; condud ' d *>»t ‘he National Assessment will have very limned 

Doh^t! I, ^ V 011 ^ ° f e ' ,her SPCC,f ' C rKearch h yP°'heses or specific 
pohcy gu delmes Beyond that, our analysis of the prerequisites of an effective 

Wiihin r r Km u , 30031 aCCOUnt,n? d ' mo — ‘hat no such system hes 
about, h m j °' ' ducatl0n We have *1* expressed our pess,m,sm 

basis f T development of a general theory, which might serve as a 

basis for an effective system of educat.onal/socal indicatoi. And finally, we 

chan ^ the most hl “>y approach to educational 

oartirula H° U§ en ? rgmg the Use of lncrcasin g!y refined self knowledge by 
particular educational institutions 

mor^ofr TT l”'* and ,h ' h °P' change occurring 

cold, “ ° r ‘ nSt,,U " 0, ' al '<^1, “ « helpful to consider various 

knowledge !eem 3 "' C ' ' h ' ms,,hltl onal use of evaluations and self 

th^enl!! 1 °" ly i ,ncldental| y wllh ,hc ™»ny psychological and social factors 
neidifv and 1601 : ' iT educatlonaI systems and other organizational settings to 
nwchl™ y , OWn aU,,l,d ' S and b 'hav.or patterns, the hteralures of 
mTTof h OSy j or ^ atlIzatlona l development, change theory, and learn- 

lh L ' e7 e mam,eS ' b ' y °" d ' h ' P° ,n ‘ °' seasonable doubt that 

Th7re fcre 7 330 C ° mm ° n charac, ' ns “cs °< People in institutional settings 
fwhich i -' P resen t purposes, we take human weakness and inefficiency 

emohl h '.T V " W ' d “ ! ' r '"Sth and sensitivily, as givens The 
clsemn"' 7 * d “ Cnbm ® »h,ch help surmount or c.r 

not the broad' aCt0rS ” hcn necessary In other words, the question here is 
broader one of how ,o make all people in educational organizations 



SOCIAL INDICA TORS AND THE REFORM OF EDUCA TION 187 


continuously crave change and utilize critical evaluations but rather how to 
create conditions which induce or force people to overcome habitual resis 
tance to change and critical evaluations 
What cannot be considered to be a given as we have seen m the foregoing 
analysis of the National Assessment is the availability of high quality evalua 
tive information Decision makers in organizations might be faulted for not 
craving information but it is more difficult to blame them for not using the 
highly imperfect information which is accessible to them In general Wilen 
sky 13 and others tend to give the benefit of the doubt to the quality of the 
information and to see decision makers as refusing to act on the basis of 
adequate information That is the quality of information gathering has far 
outpaced its utilization A much more extreme case of attempting to assume 
high quality information is found in Gary M Andrews and Ronald E 
Moir s volume Information Dicmon Systems in Education '* a 1970 publication 
which emits the evangelical tone of the nineteenth century itinerant mounte 
bank— a tone not very different from that of too many twentieth century 
salesmen for PPBS or other educational management information systems 
Analysis of the problem of underutilization of information is retarded by such 
efforts to separate the issue of quality of information from the 
This ,s no, a case of total disagreement with Wilensky it . s mo ire . m er of 
emphasis Indeed the best support for the contention 
utility of information are inextricably bound together “ “ 

cellent definition of high quality information 1. is much more than merely 
pragmatic and well organized It is 

clear because „ is 

timely because it gets to them nrore d ure s see it in the same 
diverse observers u “" S c *',^ e L of concepts and measures 
way valid because it dudc logical consistency success 

that capture reality (the tests in Imowledge or mde 

(ul prediction congruence wdh^estabhshe^^, _ 


high probability of attaining 
new goals are suggested 


..Harold L Wilensky OrennunU^llntcH^ 
. , , /?si cw York Basic Books 1967) 

m P'““ k 1970 

is Wilensky op at P ,x 


hnou. ltdge end Pol w Gortm 



188 


A STUDY OF THE NATIONAL ASSESSMENT 


Arthur B Toan, Jr , a partner m Price Waterhouse and Co , agrees on the 
need for timehness, rehabthty, and clanty, and adds two add.ttonal criteria 
at seem reasonable First, information must be of a significant or vital 
nature to the organization involved Second, evaluations should be accompa 
rued by appropriate bases for comparison In the latter context, Toan suggests 
five useful poss.bihties comparisons against the past, the present, a plan for 
the future, an ideal, and competitor* 16 

The main point here should be quite obvious When have school systems 
andean’ ^ aU ° n mee ' lnS CnIena mch « set down by Wilensky 

Certainly there are available great quantities of neglected information of 
goo qua ity ut can use really be expected until either information is of high 
qua ity or there is very strong incentive for innovation’ Experience- m other 
' , indicates that, lacking high quality information, predictable use is un 
! Cy For msta "“. problems tvith the quality of information have been 
dort.men.ed and analyzed w„h regard to sociological research and its use 
n nonuse ,n such diverse fields as medicine, business management, educa 
tion, and American foreign policy 11 

we^ e > q, ;f ’ V' P ‘ anmns and evaluat,on education has been especially 
weak, SO that if it improves, it will be necessary to reeducate decision makers 

as ClrCumstancc ^ we know-, informauon on major education issues such 
. I S1 ^ C ’ P " pupi1 “pendliurcs, teacher experience and qualifications, 
general 'terms 6 h ‘ gh ' y “ n ' rad,c,0, >■ and **tgely incorrect In more 

a studv ‘in"*,?, an,ld )' < ha < ■' usually ,s poss.ble to redo 

Th“ nrartm a ° n ,he data 15 ^ ual,,Kd '< "Ot refuted 
dmfart thm, ’ a “■U'U'hat rare occurrence-bu. 

ewdena wh l IT™ ' pr ° mp,S lcad '” | uP to take evaluation 
^ tha n „“,tlgh,“ deS,rUC, ‘ Ve '° ‘ hC **“* -o us 

Indeed, there is substantial evidence that even in the business world with its 

pp 6-7, 1 , AJb " ,B Toan > J r UmghlomulumloMawg, (New York Ronald 1968) 

on manacemen. M N^rr? Hyman , on turd, erne, Abraham Zaleamk and Annejard.nl 
\\ Phillfps Davison o^t'” F “ hm * n °" educational establuhmenls and 

and Ha,„Ml. wT°t 7^ ^ *“ ,n Paul F hararsfeld, William H Sewell 

"Martin W F L °I &x ' ,!o 0 < N ’e» York Banc Books 1967) 
lamn W Essex Educational Evaluation, 1969, p 76 



SOCIAL INDICA TORS AND THE REFORM OF EDUCA T10N 189 


relative precision, it is highly probable that competing information systems 
will produce different evaluations and implications for policy 19 

As a matter of emphasis then, it seems important for change agents seeking 
to develop accountability and evaluation systems to keep the issues of quality 
and comprehensiveness of information in the forefront Furthermore, they 
should seek to inform decisions that will be made in the face of inevitable 
uncertainties Many branches of the business world have already adjusted to 
this provision of uncertainty, which is clearly more ingrained in schools and 
other service-oriented organizations whose specific goals and output measur 
ing capacities are less specific 20 

H N Broom 21 and Ijin, Jaedicke, and Knight” discuss decision making 
under cond, lions of uncertainty, or what the latter authors call-perhaps 
unfairly, because they imply some pathological state-“ill structured envi- 
ronments ” Broom states 


The decs, on-making model mosi common lolhebuimes, and 

managerial field is that inv0 ^ ln S ^, ay e!I , sl „hen some 

but usually not total ignorance Un > ' Ju The case 

objective experience is available but not enoug 
tends to approach risk, but decision ™ ^ec- 
that a probability distribution is n , st re iy m0 re 

ttve experience decreases, .be .decision ^on of 

upon the personal feelings or ju S 1 ” ,f, e f ie id ar e to de 

various strategies Significant P ro becomes so small that 

termine (1) when the objective experience : becom ^ ^ 

„ Should be replaced by a^e* P-^be usS to com 

bine^ffectwely^ubjecove pnibability inlormalion, and (3) in gen 

..See Edward H Caphm, ■ Betavmral p^Zt^TZ^ 

ZZZZZ5Z2ZZ & NJ p " 

“ HSll ',‘r discussion o, pmbleim - — 

m the field of mental health see v Winner and Edith Tufts. 

—7^', social Secunty Ad»a» 

“The Effecuveness of Delinquency P HEW, 195-1) HPh- 2 NJ 

nation Children'. Bureau, NO W 

21 H N Broom, Bustness rot j 

Prenl.ce Hall, 1970). P “ 

«Op Cit, p 



190 A STUDY OF THE NA TIONAL ASSESSMENT 
tnbZ'ns^ a COnf,denCC measure f0r “IS"** probability d,s 


tems m 17™ ’ ^“"lability and evaluation sys 

mlfun — 7 e g b S and prcentat '°" ° f ■"f-nnauon, should take 

of most , de “' on mak,ng Md prepare *« 
psycholoev whfl US positively with the situational politics and 

“ 0SY Wh,Ch “ b0 “ d “ <* roajor component the decJmn making 

flJST!iZ“ °' ° bta ' mnS hlgh -9” a "* talormation, even though 
Im to ol: a “ n °‘ a , Va " ab ' C ' and aSSU ” ln ® ,h2t to «» time 7 
“v7 P ™ Je h r“" W " * «™«d a, through subjec- 

' a v7T;,7 b e r? Van ° US ““ Stud >“ “ d -'•>'>'- enumera, 
an o JamU rj ** “*'* ^ ° f ■"'■'"nation collected b, 

is r** e ;:ir r Jr t* ™ - 

m enhanans ,hc usc ° f -'~f~n pu,abie 

roe^nlhe „el7orTa7e nt .he d T “ "* ° f P ™ *«” 

pressure for eh* v H de8rec ° f ur S enc y or (internal or external) 

mformat on K dehb T * ****** —« <° which evaluative 

rr hered ;7 e “•* and <*-*— ° f *• 

ganization the desert f * i *** and degrec of specialization of the or- 
genel the ™ J 7“' “““»*• Ih " d ^" of population horn- 

‘'° U,S,dm ” “ «-to,A evaluations, 
petltive evaluation ^ T par,l<:I P at,on ,n evaluations, the use of com 

77mmento,o ' , h P0 7' ' h ' ““““■to —fa— of evaluation 
de~™; '7 “ ' ° f Pr ° f “ ° r «*« reward systems, die 

or77~“'rf " ° r ( T rdeC ' ntral,2a "o„, the degree of openness 

sionals as fact *' ' Va, " aUon - and « h « “*= of neutral profes 


73 Broom, op at., p 60 

tonal Policy (New York McG ra . w^S. IJ^ 7 ^ ^ Ladd ’ W Eduta 

Jr , “Changing a Stereotype Industry * / ’ ^ 1aTTOW and John R. P French, 

37, 1945. Howard E Freemar ‘ ^° U ^ al Issues vol 2, no 1 , pp 33- 

Large Scale Intervention Programs » *!T P approaches to Assessing Impacts of 
PP 193-194. Philip H ThumonTrrf mtn ^ Stat ***al Association Proceedings, 19&4 

Business School, Division of Research 19 -,q\ T * &W fctpmsibtlity (Boston Harvard 

of the Crisis m Evaluation ” m \ ° an °P oit.. Karl Mannheim, “Root! 

(Romlcdge, 1943) pp 15-30 Elune^M^ Zf”' E "V’ »/ « Sxukpst 

. fclung E Moraon. “A Case Study ol Im.novai.on,- m 



SOCIAL INDICA TORS AND THE REFORM OF EDUCA HON 191 


THE NEED TO BE SITUATION-SPECIFIC 

Consideration of these numerous conditions and approaches that increase 
and decrease the utilization of evaluations leads to a predictable but general- 
ly neglected conclusion There are many factors that tend to distort and block 
the use of evaluations by organizations and, short of radical transformation of 
the overall milieu, these factors will be found in highly varied combinations 
from one organization to another There also are many approaches to in 
creasing utilization of evaluative information, but any particular one ol these 
is useful only in a given situation in which particular factors are causing 
distortion and blockage The implications for federal state-local and private 
evaluation polices are apparent Within the costing pseudo decentralized 
framework, any evaluation scheme which assumes that conditions blocking 
and favoring change are the same — or even largely similar in eac 
different organizations and institutions will not succeed in rm S‘"S 
basic changes This would apply for a large school system which . i mp. d 
evaluate and change the local schools comprising as wel as fo a «« 
evaluating local school dislr.cn, a (oundalion evalint..* 1^"* P"* 
sites, or the federal government evaluating states I or r local <*« 

Instead of taking for granted . ha. ranona^iy ^ ^ 

political (actors are germane in ' * assumptl0 „ that each local 

must begin designing evaluations P lzatl0na l, cultural, or psy- 

or state educational instituuon has p crl ncism and change more 

chological id.osyncrac.es that mak ” C “£ 0 «,c,als are ordering 

or less difficult Whether federal, state, loc , P thcy „ lU ne ed 

or conducting the particular Jp ect llldu ce higher utilization 

situation-specific approaches . I h h ly sysKma tic consideration of 

of the evaluation results This ^ we |[ as mor e sophisticated 

the kinds of issues discussed ,n ’ mducmg changu and alnring 

training for evaluators m the f [uat , 0 „. sptcl[ , c work has begun 

attitudes and behaviors In fact, on ddficult as intimated 

earnest, it will become clear ***££„ , dl0Sy „cra.,c aspecls, but 
above Every situation will conun „ wl ,| f„ one ol several 

more often than no. the si tuation s key 

~ /pjj ) The Planning oj Change (N* w 

„„ Liule, Brown, .967), chap 13 



192 


A STUDY OF THE NA TJONAL ASSESSMENT 


recurring patterns, for which evaluators should eventually be able to develop 
particularly effective patterned approaches The point is that an under 
standing of the patterns of the differences among organizations (in terms of 
their receptivity to change) will be more precise and effective if it emerges from an 
initial assumption of pervasive uniqueness, rather than from the present assumptions that a 
few approaches should work m every case 

These observations are applicable at each level at which evaluation of 
educational programs and organizations might occur But they seem especial 
ly relevant now at the federal and state levels, where so much investment is 
being made tn accountability systems designed to improve the delivery of 
educational services at the local level 

It is important to Tetum to the National Assessment here, to note that it 
has been one of several potent forces serving as a new impetus for an institu 
tional creed, that is, the social accounting movement in education is spread 
ing all across the country at a rapid pace, often in imitation of the 
Assessment’s style Insofar as any refined accountability or evaluation systems 
at the federal or state levels might become effective in securing basic changes 
in education, ultimately they will have to rely on precise knowledge of the 
diverse goals and situations in a variety of local systems 

But, a major subtlety in all of this deserves final mention As we earlier 
noted m detail, the Assessment’s consensual style obscures the fundamental 
conflicts in the goals and structure of American schooling and makes any 
fundamental resolution impossible This indicates that the data collected will 
be of little immediate use in more closely approximating the nation’s laby 
nnthine educational goals Instead, the ultimate major function of the National As 
sessment may be to focus attention on the contradictions, and the fadings, of the overall 
organization of schooling in America If this is so, the long term effect will probably 
be to reinforce the trend toward greater educational decision making power 
at the federal level 

This change might veer in the direction of greater uniformity in schooling, 
whereby the goals of public education become limited to the transmission of 
certain widely agreed upon skills, or change might take the direction of a 
new freedom whereby equality and basic training for functional literacy re- 
main as goals but are defined in terms of an increasingly diverse range of 
respected educational outcomes, many of which might be determined at the 
local level 



PART TWO 


RESPONSE OF 

the national assessment 

OF EDUCATIONAL PROGRESS 


The followingreply to Mr Greenbaum’s 1973 report was 
written m early 1975 Because Mr Greenbaum did not update his material, 
we felt it would be unfair to make more than minor additions to our response 
Readers should know, however, that changes at National Assessment render 
obsolete some of the descriptive miormation and criticism in this report For 
complete information about the current National Assessment, please write to 
National Assessment Project, Suite 70 0, 1860 Lincoln Street, Denser, Colora 

do, 80203 


Ro> H Forbes 
Director 



1 


GOALS and 

ACCOMPLISHMENTS of the 
NATIONAL ASSESSMENT, 
1969-1975 


Several years a go, members of the Carnegie Corporation 
ccided to review a number of the programs and projects to which they had 
contributed developmental funds Consequently, they commissioned Mr 
William Greenbaum, of Harvard University, to review the National Assess 
tnent of Educational Progress (NAB P) His study entitled Measuring Educa 
tional Progress A Study of the National Assessment, is published in Part One of this 
book 

The staff of the National Assessment of Educational Progress was mvjred to 
fespond to Mr Greenbaum s review The following paper largely under the 
editorship and authorship of Dr Rexford Brown of the National Assessment 
staff, represents the National Assessment of Educational Progress view of its 
goals, accomplishments, and efficacy over the years 

Contributions to the following chapters in Part Two o! this book were 
made by many mdn iduals Prominent among those who made substantial 
suggestions, however, were the members of the National Assessment of Edu 
ca tional Progress Analysis Advisory Committee Dr Frederick A f os teller, 
Harvard University (present chairman) Dr John Tukey, Princeton Umver 
sity (past chairman), Dr Lincoln Moses, Stanford University, Dr James 
Davis, Director of the National Opinion Research Center, Dr Janet Ela- 
shoff, Center for Advanced Study in Behavioral Science, Dr John Gilbert, 
Harvard Computing Center, and Dr William Coffman, Unnenity of Jo»a 

19J 



196 


RESPONSE OF THE NA TIONAL ASSESSMENT 


National Assessment is a dynamic project based upon prolonged and 
thoughtful planning by prominent scientists such as Dr Ralph Tyler and Dr 
John Tukey The fact that it has remained true to its original charge and at 
the same time responded to the changing times is repeatedly demonstrated in 
the following analysis of the project and its plans for the future 

More than 63,000,000 Americans are directly involved in the educational 
process today In 1974-75 more than SI 20 billion will be spent by education 
al institutions, making this enterprise the nation’s largest Since the first 
meetings were called in 1963 to discuss the nationwide assessment that be 
came the National Assessment of Educational Progress (NAEP), the expendi 
ture per pupil in average daily attendance m American public schools has 
more than doubled, rising from S419 to almost SI, 000 The federal 
government s yearly contribution to the support of education has increased 
sixfold— from S2 1 billion to almost SI 3 billion in the same period 

These staggering numbers suggest both how critically important it is to 
gather national level data about educational achievement and how enor 
mously difficult the task is bound to be 

The National Assessment of Educational Progress is the only attempt that has ever been 
made to appraise systematically this gigantic American commitment The cost of this 
farsighted attempt to determine the effectiveness of the educational enterprise represents 
3/100 of 1 percent of the annual federal investment and 4/1,000 of 1 percent of the total 
yearly investment in American education 

The consequences of investing such a minuscule proportion of the federal 
outlay in an effort to find out what S120 billion is buying the American 
taxpayer have already been manifold And the future promises even greater 
returns on the investment When a school board member wants to know how 
many seventeen year olds can read at a certain level today or how many 
more can comprehend at a certain level than could a few years ago when a 
legislator wants to know what level of political awareness is typical for thir 
teen or seventeen year-olds when a researcher wants to characterize Amen 
can attitudes toward music or art or literature, when a government agency or 
foundation wants to know whether there is a need for a Right to Write 
program or a National Mathematics Foundation or a major commitment of 
funds to particular career-directed or vocational programs or whether the 
nation needs a major investment m programs designed to increase parttcipa 
non in the fine arts — there is currently only one hope that the necessary 
information exists or is forthcoming the National Assessment of Educational 
Progress 

The following pages set forth a rationale for the Assessment and describe its 



GOALS AND ACCOMPLISHMENTS OF THE ASSESSMENT 197 


major goals and accomplishments It is hoped that after finishing this brief 
introduction the interested reader will turn to some of the many detailed 
reports of Assessment findings for a more concrete appreciation of the scope 
and importance of this project 

J Stanley Ahmann 

Project Director 1971 1975 



2 


CHOICES FACED by 
NAEP’S PLANNERS 


" I ^ le P nmar y goal of the National Assessment of Educa 
tional J rogress ts to provide educators and those who allocate funds for edu 
cation with concrete information about educational achievement. 

Ehe assessment idea grew out of the realization, in the early sixties, that 
such information did not exist and would be needed if the federal investment 
in education continued to run into the billions of dollais Given the decision 
to gather such information, ,t was necessary for the planner to choose among 
a emativc types of information, various means of gathering it, and numerous 
ways of analyzing it. The choices made m those early yearn determined the 
character of the National Assessment’s developmental problems but also laid 
« ^ " ° r ,ts £ reat potential as an educational resource Had other 
oices been made the Assessment would have faced other problems and 
would now serve other purposes 

Accordingly, any evaluation of the Assessment must begin with a consider 
at, on of the choice, facing Dr Ralph Tyler, Dr John Tukey, Dr John Card 
cr and the many other early planners of this unprecedented project, as well 
as the choices the Assessment confronts today as ,t continues to strive for 
improvements 

t VVU k’ w k° was to be served by a national assessment* The answer, 

7 Cry "T* ^ cducatlonal P°hcymakers at the national level- 
pcoplc who must decide, for instance, ,f more federal money should be invest 
m raising the national level of literacy, or if the federal government should 


198 



CHOICES FACED BY NAEFS PLANNERS 


199 


allocate more money for vocational programs (and if so which ones' 1 ) or if 
there should be a National Endowment for Humanities (and if so what areas 
in the humanities need strengthening through government support of what 
programs?), and so on Ideally, the audience should consist of individuals in 
executive branches of federal, state, and local governments, members of Con 
gress, legislators, school board members, and every citizen who votes A na 
tional assessment would not be established primarily for educational researc 
ers — this in spite of the fact that many of the founders were themselves 
established researchers who would have relished an opportunity to tes y 
potheses on a national scale This early decision, made in t e 
practicality, narrowed somewhat the audience for the assessmen 
L assessmen, data A have research potential and have, m ac , con nbated 
•o educational research, they were not , n f„„nauon 

The second planning question then wou | d a ] wa ys 

would be useful to these policymakers, given t a s h 0J( . ct s efforts 

face limited funding’ The answer was very — “ ^"Ldence of 
would be “census like,'’ analagous to mcas distribution among 

unemployment, its changes from „me to ..me , 0 * 
regtons sexes, and various soaal chan?B them , however any 

tional deficiencies and inequities an deficiencies and inequities 

effort to gather enough data to ^ullyjhose deheten ^ ^ ^ 
would be too large an undertaking of a d ilfetent kind 

could best be met with smaller scale stu i n0 , be very dramat 

Census like data the plannets knc ” eve [„„damental questions 

lc People expecting quick and simpe ,„„<al assessment re 

(W,y ca'n. Johnny read’) would „ rgc „ t today-, o 

suits But there was an urgent nee „ ona l deficiencies and mrq 

document and measure the magmtudeof ^ mlg h. shape the 

uities as an important step m '“^^vemenun certain areas we need 

future As efforts are made to impro Durl „g the Assessment s f 

to know which efforts work and which ch „,« have been obtious 

seven acUve years .he Lessmen. have often been 

and somettmes negative ^^“ ^.a.men. mevitabte In »mc re 
unrealistic and a certain amount. ’'Unavoidable 
spects, the ra f Xiggtefor ^rednabthty. recogn.uon, as 

£2==s"S=S5==s=i 

al projects and agen 



202 RESPONSE OF THE NA TIONAL ASSESSMENT 


the ycars - but ° n ,he wh ° ,e ' > h = a™, „ „ 

"T-- d '" ,CUlt “ flnd —nem Hems ,n the early yeais 

“17? n ° ° ne Wh0 k "°- -hat the “state cl the art” was tn the 
„ TZ „ ” aJOr *“ dC ' Cl0pCrS had budl a *rong reputatton for sac 
Zdl?Z° tet,nS ° f 3 VCr >' ,CW “8"“*™ skills L Assessment's 

mrrr ntS m the doma1 "' *- — W » areas such as ar, 

aUmTt’ ,t d T P ,r' a " d d ' agnOS,,C *“""8 * aad tor innovative 

industry of ,1 A ° a!i S,mpI > " cctdcd the capabilities of the testing 
« " day A dd “ that «* Assessment's dlfLlties getung exer 
Ze Zon Z , r " leW PrOCedUr “ “P'^ented when „me and money 
oZ or 1 “ n °‘ SUrPramg that o f ‘he thousands of items dev el 

^1n utn i T SSmentS ° f dub — quality Fortunately, the qua, 
ZZLZlo 0 ' “ P °° h 25 a Who,e ' oo-h-uotf ™«h that ol the 
the poor items a " d anaI >' ses - outweigh the distracting effects of 

VZZnZZ t U " d ‘ he,r Wa >' m '° a “I the ear,, assiments 
the mixed oual t ? T “* m ° re atten *' 0 " from the Assessment staff than 
T 3Sra ' nt ' ,cms ' - *ho fater discussion of NAEP goals 
beend e vo te S :"“ , /WSmC,U “ 3 '° n * '™ operation, much Con has 
high standard f ”* apment of Procedures for ensuring that the Assessment’s 
plmt that S ™ ! qual,,> arc bei "8 -O' Suffice to say a, this 
~ 'JZZrr CaC, ‘ >K * r are fan the last, thanks to a 

upon,tscontrLn'^ K:0 i, ntr01 prOCedure and NAEP's considerable influence 
points out NAEP^'L h i mea3Ur ' m ' ntCOmmUn,,y ^ William Greenbaum 
Tons of relyum so h . P *"»1» °< *<= ™™.« lim.ta 

cogmt.se ZZZ ZZL°:Z m 'f““ d and — „ defined 
and comprehensive* OVV w ^ es P reat ^ ,n finding more constructive 

162) NAFP n ha lnd ‘" d ual educational deselopment” (page 

higher o„aL ,n ng ' ha ' “*«- “** and the consequence wrfl L 

suer quant, ,n excretes, greater usability of resulls 

^ZZTZZZZZZ " “ d — exercises 

pie who are meeting specific oh U ^ Td "Poning percentages of peo- 

refetTed to as a entennri i jeemes This approach has sometimes been 
that term has acqmred morr s n ^!r m Tf rCmC "' (CRM) a PP roac h, but as 
assessment is objectwis referencTtt ' C d ' f,m, '° r ’> “ no lon S'r applies- The 
achievement levels ,n a omen at ' Cmpt has ^ brcn made to define 

cent of the nine veer u,7 re f' r 'nccd fashion— ihat is lo say ‘ 80 per- 

H hat percentage or A^nenca”” ^ m "' 11,15 0bjMl ''' 90 P crccn < of ihe time ’ 

PU cemage of Amenca s ,ee„ agers • should • be able ,o sing or pain, a 


CHOICES FACED BY NAEPS PLANNERS 


203 


picture, or respond emotionally to a poem or understand evolution 3 The 
truth is, no one knows for sure what a reasonable percentage of success should 
be, partly because concrete achievement data have never been available 
those are the data NAEP is attempting to collect m the first place' And one 
of their primary uses may well be to assist CRM test developers in estimating 


realistic criteria for learning tasks 

In short, the Assessment planners never aspired to the creation of a criteri 
on referenced test as such tests have come to be defined today They chose to 
leave the establishment of performance levels to the readers and interpreters 
of the reports of Assessment findings And so it is today Each rea 
report can decide for himself whether 65 percent on a given exercise is 
low, about what one should expect of the age level under existing con 1 
or surprisingly high Each reader can mull over the implications of the per 
centages selecting various wrong answers or responding in var '““ 
b.= but interesting ways to particular questions 

encourages professional groups to analyze b [|K cumcu , a r and 

compare the criteria to the actual rcsu „ secms llke a rea 

pedagogical implications of any disparities ey responsi 

sonablc way to apply NAEP data to capable of doing 

bility for establishing criteria in the han s means to correct 

well— the learning area profess, onals who have a. hand the mea 
inadequacies and respond to potential I wea nes ses j ata analysis 

A„ the decisions discussed reporls Clearly the 

have had direct consequences for Nation „p, lv e The Assessment 

reports have had to be descriptive rat er nQ( a j irec t educational 

could only be an information gathering proj leg , statures re 

change a/en, The early hope - “ d 

searchers and others would examin rrord]n gly the first reports »ere 

recommend changes where warrante , rc p 0r ,s No infenmces uere 

very cautious very dry descriptive s were and no recom 

drawn about what caused the results to be »h 

mendations were made for ““° rest m th e early mp°«s 

Although several groups did e*pr«s lha , people •« "« 

response was generally ^ val ,able The Assessmrn. imu.d 

going to use the .» disseminating its result* ^ 

have to become far mo gS becn majo r ch S 

their interpretation lA^ing a operable effort has b^n made m en,, 

men, reports over the ^uncil for the Social Studies 

professional groups (eg 



204 RESPONSE OF THE HA TIONAL ASSESSMENT 


a Science Teachers Association, National Council of Teachets of English, 
National Council of Teachers of Mathematics, Music Educators National 
Conference) in interpretive efforts. These efforts, coupled with a new report- 
ing strategy that aims at several different audiences, have greatly improved 
e Assessment's visibility and utility, as we will see in the discussion of more 
specific NAEP goals. 


' en the ultimate goal of the National Assessment, it is apparent that a 
number of its major features followed as a matter of course. Each feature- 
tile primary audience, the content of the assessment, the objectives, the types 
exercises, the analyses and reports — was considered in the context of prac- 
ical alternatives and such factors as funding, the state of the art of measure- 
ment, and political realities. There were choices to be made along the way, 
and they were made in full awareness of the alternatives and the conse- 
quences. However, the Assessment is always prepared to reconsider choices if 
pments warrant change, and it is always looking ahead for ways to 
expand its utility, as we shall see in the discusion of specific goals. The 
Assessment is a flexible project, both responsive to the times and tme to its 
nginal mandate; it is growing, refining its procedures, and interacting prof- 
y wit i sue audiences as legislators, curriculum specialists, classroom 
teachers, and parents concerned about American education. For a project of 
such scope and importance, it is off to a very impressive first half-decade. 



3 


GOALS of the ASSESSMENT 

and PROGRESS toward 

ACHIEVING THEM 


GOAL 1: TO DETECT OFYOUNC 

IN THE EDUCATIONAL ATTAINMENTS OF 
AMERICANS 

The ‘‘baseline’’ period is over no, lor 
era has began In .975, NAEP ** “T 

ment since 1969, stimulaling national sc]e „„l, rally literate elector- 

science education and the implications ol a ^ ^ ^ , he year re- 

ate The NAEP Writing Mechanics rep» ’ seventeen year-olds and thir 
vealed declines in overall writing skill , or mne-year-olds Tin 

teen-year-olds but a surprising P°”“ n wntmg ’s importance to h, 

information, too, sparked nation*' *"£££ 'to isolate major nrmng 
society and prompted a number ol s ud « S B Read Program, NAEP 

problems and find cures increase in the »"* 

i u! u-N the results showing an unexp wl || see reports o 

changes ,n R ' ad ‘" S r iherealter will bring non „ „„ pn- 

and Citizenship „ NAEP sets about S 

changes in American educa. 
manly established to do 



206 RESPONSE OF THE NA TIONAL ASSESSMENT 


National Assessment's unprecedented attempt to measure educational 
change has forced it to sail m uncharted waters How much of a change in 
attainment should one expect to see ,n a four or five-year period’ If ,t is 
sma!!, will ,t be detectable’ What kinds of sampling, weighting, and analysis 
will best enable NAEP to detect changes of one or two points’ What statisti 
cal and analytical operations will make us confident that the change is real 
and not a reflection of ^proportionality of populations or subtle changes in 
test administration or sampling variability? 

The Assessment staff, along with a special analysis group headed by Dr 
Jo n Tukey, have given highest priority to the solution of these thorny prob- 
ems ey have developed procedures to ensure that NAEP’s percentage 
estimates are as “sharp” as any that can be obtained anywhere, and they 
have developed a procedure to balance out “masquerading ' due to differ 
ences between proportions of various groups represented in samples selected 
‘n anV , tW ° years Numerous other innovations, e g , new appl.ca 

lions of meth ods for estimating standard errors, improvements on the NAEP 
bjancmg technique, and other approaches too technical to list here, are 
F ft" F , "' lurs Educat,ma > Mtamremmt Systam, edited by William 
A man , U lce ,c 10 say t * iat *he technical accomplishments of the 
Assessment in this area have been considerable, and its innovative plans for 
the efinement of change measures are sure to have a major impact upon the 
hard 'd? u ' dUCat ' 0nai measu r ement Nowhere are the standards for 
tan t T r' 1 " 8 advanced Wlth m °ee diligence and more potential impor 
uncc for the future than within the National Assessment program 

are* in Fw* rCa * 1Zes l ^ at ^ measurement and reporting of change 

mlr r 8 pn ° n,,K ’ “ niml continually improve its procedures to 
nmre he accuracy and increase the utility of those measurements Accord 
mgly ,t is engaged in a number of refining aa.vit.es 

the A^T ldr< T i C ° mp '“ qUCS,, ° n ° f hanging objectives The problem 
theAsscssmcnt faces is ,h„ „ objea.ves remain the same over two or three 

cham-.nn ^ ™ Y l ° 5 ' ‘ h " r rc,ev ancc educators, if they keep 

uamoilf: n "l"' cannot report progress or decline achievement of 

• men -oh ° J ' Ct !, VCS , TflC ,Iret S, 'P I" 'h' solution of the problem is to create 
J ives —that IS, objectives that truly reflect the essential aims of 


Data' presenteeT by^l^Stanl ^ a'k^ L °° 1 ‘ ' h ' A " al > , ' ! of Na "° nal Assessment 

a f Ah "“ a ’‘ “ "■«“"> E Coffman fed) F,Mur, «/ 
pp 89- Hi "formation Systems— 1973 (Boston Houghton Mifflin 1973) 



COALS OF THE ASSESSMENT AND PROGRESS 


207 


learning-area instruction at any time in history The new Literature objec 
tives, mentioned earlier, are a step in this direction The next step is to 
“weight” the objectives so that a particular assessment will emphasize those 
objectives most relevant to the profession at a given time As emphasis in the 
profession changes over the years, so will emphasis within the assessment 
objectives change to keep pace and ensure relevance of results Today, the 
Assessment routinely requests hundreds of subject matter consultants and lay 
people to weight the objectives and subobjectives for their relevance an 
importance The staff plans to refine this procedure even more in the next 
several years . . 

Aware that exercises must have excepuonal integrity if they are to 
to measure change over five- or ten year periods, the Assessment as 
considerable attention to the refinement of exercise review an <l ua 
tro! procedures Today there are four times as many item revi 
were m the first two years, and items are scnitmized 
angle by subject matter experts, professionals in c 1 eve ’ 

men, spccjists, and experienced staff In addition n» - ‘ 
qniredm explain in writing wha, a particular exerc- "ISse who 
relates to an objective or subobjective, ^ who se]e c particular 

select the correct answer, what we can mi exercise will pass 

wrong answers, what should be reported an rcv]S10n are reviewed 

without such a rationale, and all items sent wrltc rs and reviewed 

again Throughout this elaborate and conyf™^ , mpos ed by our fust 
are constantly made aware of the specia q 

priority goal— to measure cha ” s “ ° V pp ™del that could improve efficiency 

There are other changes in the NAEP m c , in olh er wajs 

in measuring change but would alter the Ass orga „«ation strives to 

All such changes are given senous -nsntealro a PP- » <* 
respond to the many demands made upon . ^ organlza t,on will 

in order, the Assessment, like any growing 
make them 


GOAL , TO -MSESSK" 
OF YOUNG AMERICANS 

- — £ r- — ™ r 

sment project 5 



208 RESPONSE OF THE NATIONAL ASSESSMENT 


seventeen year-olds and young adults, it gathers only the kinds of informa- 
tion one can obtain in assessment situations, and it gathers only so much 
testing information as it has time and money to gather But it intends to 
obtain a valuable (and heretofore unavailable) kind of information which, 
when coupled with other kinds of data about education— enrollment figures, 
cost per pupil ratios, standardized results, small scale studies, etc— can assist 
people in spotting deficiencies and, m general, seeing where American educa 
tion might be headed 

Such a goal poses numerous questions Should the Assessment stay with ten 
learning areas or should it have more or fewer’ Should results be analyzed 
with the current five variables or should the Assessment employ other back 
ground factors which might correlate more highly with achievement and 
a ow more hypothesis testing 3 Should Assessment coverage in particular 
earning areas be broader or deeper— that is, should the Assessment address 
particu ar su bjectives in sufficient depth for research, or should it attempt 
to measure all objectives equally? 

So ” e ‘ h “' c l urat,ons have been dealt with briefly in the earlier discus- 

i m i P s overriding goal However, they deserve more comment and 
should be examined m the light of the Assessment's accomplishments to date 
egin with, NAEP has not quite completed its initial mandate to re 
assess in ten learning areas There have only been three reassessments so far, 
and ,t will be many years before all ten areas have been covered twice This, 
e , represents a tremendous undertaking and, considering that it takes 
ive years to develop an assessment, it seems unlikely that NAEP could 
l !i ,tS a ^ a covera g e >f only for logistical reasons In fart, it has alread) 
and m ^ ^ tud,es and Citizenship and combined Literature, Art, 

usic into a Fine Arts assessment in order to use resources more effi 

cientiy 

cnvrmcr 1 should not be ignored in any discussion of potential changes of 
, 1S muc h information has already been collected Before one suggests 
amount of T* ^ ^ i ^ ssessment * one should appreciate the enormous 
zemh.n r'TT" ha " d ab ° Ut ach,evem ent m Science, Writing Citi 
and Orrniw ,terature » Social Studies, Music, Mathematics, Career 

I d in rM °" f Dcve!o P ment » and Art Most of these data have been ana 
basic anal * a ^ °* man Y possible wap The Assessment performs several 
others JR UTlt r ! h ° ° r thrCC rc P° m abou ‘ a n area, and then hopes 
sional orpan^ 030 matena,s from different perspectives Some profes 
turned th m ^ lndecd> wnltcn interpretive studies of the results or 
med them over to their research committees, with data tapes now ava.Iabfe 



GOALS OF THE ASSESSMENT AND PROGRESS 


209 


to researchers in some learning areas, it is likely that much more will be done 
with existing information in the near future 

Consolidation of learning-area assessments does not limit assessment cover- 
age, in fact, it opens the door to other possibilities National Assessment is 
already cooperating with the Right to Read program in a special reading 
assessment Plans for the near future include other special probes for particu 


lar kinds of information, a feasibility study for a functional literacy assess 
ment, and creation of an “index of basic skills’ assessment All such efforts 
would increase the immediate utility of assessment results and provide na 
tional-lcvel planners with kinds of information they desperately need today 
Should NAEP use other variables than it now uses’ There is no easy an 


swer to this apparently simple, but, in fact, very difficult question As a fust 
step in answering it, however, NAEP contracted with Westat, Inc to eon 
duct a thorough review of the literature of background variables e resu ts 
of that review appear in a NAEP monograph entitled Associations 

sidering the implications of that review ,or ^ ^ ucallona i equities 
factors, not wtth a view toward aplawmg »* ^ ^ uKd causally)i 

(the review makes it clear that no existing ^ ^ no( overlap M 
but wtth a view toward creating cleane b|e wlth /actors uscd ,n 

much as the current ones do and that ^ ncelvabIy asslst NAEP m 

other educational studies Other variab Mcnmn , 0 apltssn educntioml 
presenting better descriptive data t ere is ^ he j p eX p| ain educational 
achievement Nevertheless, given varia some of them because the) 

achievement, NAEP would be wise to m^ ^ inter p ret ations For example 
could be used to strengthen the ana ysts^ j e pend a good deal on the amount 
performance on an algebra problem may might view not only 

of algebra studied A profitable ana ysis adjustment for differential 

the raw data, but comparisons ^ rnany ways of explaining results 

amounts of algebra studied .1* vgi p educators better interpret the findings 

and stronger variables may we vefSUS depth is another one that the Assess 
The issue of assessment breadt ve^ ^ ftae||t f un ds available, the Assess 
ment has wrestled with for objectives , n great depth, simply because 

ment still could not examine a arc not available Some educational 

the measurement tools for cia || y under the current and projected 

objectives will always be slig h{ . objectives has been a response to chi* 
budgets In part the effort to ^ say , ng to the educators and schola* 
dilemma In effect, Assess^ ^ ^ cvcrything you desirCi wha( th , n c* 
who weigh objectives 



210 RESPONSE OF THE NATIONAL ASSESSMENT 


should be given highest priority?” The hope is that what we do measure « 
will measure well 

But there are some auxiliary questions about breadth and depth that beg 
or clarification What is depth and what constitutes “breadth”? Depth can 
simply be quantitative i e , we need * number of items in order to be able to 
discuss achievement of a subobjective or objective Clearly, x will vary with 
the complexity of the task involved But those x items should not be all of the 
same type, nor should they all approach the skill being measured from the 
same angle, furthermore, they cannot all be easy, or all difficult 

In an attempt to resolve this problem, the Assessment has been mvestigat 
ing new exercise development techniques Most notable among these is an 
attempt to examine exercise depth in the light of current theories about 
cognitive and affective development Given that there will be, say, forty exer 
ses in the content domain of biology, we would like those exercises to be 
distributed over the range of cognitive and affective levels articulated in 
oom s and krathwohl’s taxonomies of educational objeemes 2 
Die Assessment's financial situation would limit the number of content 
mains, owever, within the domains tested there would be considerable 
strength for generalizing about performance 

It is often suggested that another answer to the breadth versus depth prob- 
lem is to create enormous numbers of exercises and then choose exercises in a 
senes of stratified random selections All indications are that the creation of a 
feasible 11 ’’ anC * com f Jre * lensIve number of exerases is not economically 

Yet another response to the desire for comprehensive, usable data is to find 
out w at people wanted in a particular assessment and did not get, and to 
structure the next assessment to ensure that they get ,t The project is de 
signed with a “feedback loop” to increase ns efficiency and utility as it grows 
ation about problems arising in a given assessment is passed along to 
pe°p e creating the next one in time to make improvements Questions 
Ut ^ ** rc P ort are asked at the beginning of an assessment cycle, not at 
n t at way we can be assured that an exercise was created with 
rtam specific purposes in mind, the scoring criteria were created in hanmo- 
y t t ese purposes, and the data analy'ses will be directly applicable to a 


A n ^ ^ * ,0 r W ) Taxonomy of Educational Objectives. Cognitive and 

iZilTXr (MC ^ y NCW Y ° rk - ,969 > *»“ © R Kralhwohl « al . Taxonomy of 

Educational Objectives Handbook 2 The Affecuve Doma.n (McKay r*ew York 19M) 



GOALS OF THE ASSESSMENT AND PROGRESS 


211 


certain reporting strategy. This is not at all an easy matter to coordinate, but 
it is being done with increasing effectiveness every year. 

Another way of increasing effectiveness and utility of data is to work closely 
with people involved in similar enterprises National Assessment and the 
International Association for the Evaluation of Educational Achievement 
(IEA) have both profited from exchanges of information about common 
problems. Contacts with the Organization for Economic Cooperation and 
Development (OECD) have also been fruitful And the yearly NAEP work- 
shop for state assessment personnel — often cited as an example of NAEP 
influence upon states — is just as profitable to NAEP as it is to others 
What are some plans for the future that will assist National Assessment in 
meeting this goal’ Not the least of them is a resolve to continue refining many 
of the efforts just mentioned, several of which are only in exploratory s g 
Others include the development of new data analyses, efforts t0 ^ , 

variables more compatible with other variables, efforts to p ace 
into a larger context of educational indicators, increased mte ngm** 
other project,, and more aggresstve attempts to as,,,. o.hcts » the further 
of existing NAEP data 

GOAL 3, TO CONDUCT SPECIAL 
SELECTED AREAS OF EDUCATION 

National Assessment's system for sampling wl ,hout over 

some “piggybacking” of small probe, upon for ajmsted 

burdening the field staff According y, / imctJ0 nal literacy “mim-assess 

the Right to Read program by year old, are unable 

ment ” The intent wa , ,o a,, " nsh,p , Th j; 

to read at a level sulftcen. for pmdm: emp Ipecla „y selected 

project involved the adm,m,.ra.,on , old! xhe Assessment 

NAEP reading exerc.se, to some 5 passed them along m £ 
scored the exercises, drafted re P or,s £ “racy tndice, «re re P° r '^ 

Right to Read staff for them me ' “ “ “^.een-year-old, both ,n and 
the^ usual NAEP vanabfe groups and fo 

of school -- ma y well to be to crea rcsca rch stud) 

Another such ““ e I97 4-75 ^^Tarf gathenng mfor- 

skills” assessment u “ r “ 5 ^ stlllj ,. „ercises directeo 

using two packages o 



212 RESPONSE OF THE NA TIONAL ASSESSMENT 


mat,on about the log, sites of such an assessment, meanwhile, the staff has 
been working w„h consultants to determ, ne what stalls are fundamental to 
act,ve and product, ve part.cpat.on ,n Amer.can socety, and what exerases 
might best detect the presence or absence of those stalls Later there w.ll be a 
iull-scale feasibility study 

.. ^ * be exact design of this study ,s dependent upon many factors 

(level of fund, ng, development of object, ves and new exerases, and prelimi 
nary indications from the 1974-75 effort), „ ,s presently annotated that the 
ndex of Basic Stalls would be assessed w.thm the next few years Wh.le the 
n ex cou provide data on the performance of basic skills, the addition of 
new background vanables would allow NAEP to explore the vanables pro- 
ln g or affecting the performance NAEP ,s also cons.denng the possibility 

Ind ves ‘satmg the performance levels of thineen-year-olds and adults on the 
Index of Basic Skills (IBS) 

wheih S1S l ^ e prellminar y studies, the Assessment will determine 

not mob ° r T/ 13 P0SSlb1 ' '° Ktabl,Sh an Index ° l Bas ' c Stalls, whether or 
- ? " ex provides unique and needed information about the per/or- 

tion can tv- 3 ' a"" 1 ° f seventeen "y ca t'-olds, and whether or not this informa 

Ihllm USe u 3 appl ' cd 11 * he ,nd “ °‘ Skills is able to satisfy 

scale de ^ COnd “ lons ’ Natlonal Assessment plans to proceed with the full- 
scale development ol the Index 

’ P “‘ al pr ° b<: planned for the 1975-76 assessment year is in the 
attention dcT ™ ath ™ a “ cs For several years there has been considerable 
mastered h'™,?' 1 '° defln,n ® those aspects of mathematics that should be 
society 3 Tn ’’ CItI2ens lf they hope to function effectively in American 
of achiev TCSp0ns ^ to dle ne ed for concrele information about existing levels 
study m taSIC ma,h ™ a '- stalls, NAEP will conduct this special 

that had ^ lopment ° f thc basic mathematics assessment is based on work 
In 1972-71 CVI0U |! y h" 5 ” d ° n ' m ma ' h ' ma tms by the National Assessment 
enteen and , '!!, cmatICS was thoroughly assessed at ages nine, thirteen, sev- 
assessniem w, '' 7 OUn8 adult ,cv ' 1 Exercises used in the special basic math 
inent " ''"h ,r ° m th ' “ crc,scs used during the 1972-73 assess- 

en content areas involved in the study numbers and number 


School “Y ha < ‘Everyman’ Really Need from 

Mathematics Teacher, March 1974, pp 196-202 



GOALS OF THE ASSESSMENT AND PROGRESS 


213 


concepts, numbers and their operations, arithmetic computation sets mea 
surement and estimation, mathematical sentences, geometry statistics and 
graphs, personal and consumer math, and attitudes For each of these areas 
the assessment will determine how many people can (1) recall or recognize 
facts, definitions, and symbols, (2) perform mathematical manipulations, (3) 
understand appropriate mathematical concepts and processes (4) solve ap 
propnate word problems, and (5) demonstrate an appreciation of the need 
for skill in the area This matrix of ten different areas and five approaches to 
each will yield a number of specific and relevant reports about strengths an 
weaknesses in fundamental mathematical skills 

Yet another special probe currently being developed combines rea g 
writing, listening, and speaking skills into a communications assessme 
the ml, ,s being developed m stages and vv.ll work its way into the schedule 

as it proves itself for 

Other special probes are possible whenever there appears ,0 be a need 
them and the staff time and resources are available 

GOAL 4 TO PROVIDE ANA ^I*® S p j!p” A Bi ' E ° y; 

UNDERSTANDABLE TO, IN [ RR 

and responsive to 

THE NEEDS OF A VARIETY OF AUDIENCES 

As of the fall of 1974 National ^^‘.^“TNam'nal aLss 
about results in nine learning areas n nro hlems of contract monitor 

ment, staff effort was largely consumed y R „ development little 

mg, sampling, data analysis and objective : and a ^ „ d pr oduc, 

time money,' or staff were available for ce i(y for the 

tts reports In fact NAEP staff f d ““''“J^eo. year (the AnnlV- 
communication of its results until the 0 1 reporting) "hen 

Advisory Commit.ee was largely by the Rescan* and Ana. 

reports in Literature and Reading P ^ shl(ll „g of a«e” u0 " , 

y,s Deparrmen. ^/“"mlai.on and a relaxalion of NAE * 

To begin was impelled by ^ £ 

Department in May audiences in different > for „ 

Til r - produ “ 



214 RESPONSE OF THE NA TIONA L A SSESSMENT 


* tT P ^ etC ’ Ut " 1Zat,0n/A PP' 1 -„ons' responsibility is ,o pro 

tCChmCaI ' m ° re •»<* selectively^ fo- 

cused ways to educators, legislatois, and many lay audiences 

.tv Po ^l T T T 1r Mm “ Utll,za,!on/A PPl-eat.ons met that respons.b.1 

,7 s :: mdAmiults drew ,rom ,he a 

aZ. Ze ’ Z C ° mnbUte C0 " Cre,e ln ^ orTnatlon to the current debate 
Amen^ *° W W cast new light on 

r?™ ar :r , m an easy - to rad - -'resting, »»**»». 

and audienc ° *** he Musical Pnfomumcc report generated more publicity 

2 fTZ 7 POnX an any PreVIOm rC P° n ’ >«* follow up, A *£*» Z 

Zf ZZ!, ™’ thC ,,Rt NAEP «P<* <0 present result ,n the 
Pro Zonal ° l^' 0 " ° f reMarch ’ —Hum, and reaching issues 

^ » *** 

1969-1974 Rec m R ea ^ tn S Assessment, and Writing Mechanics, 

NAEP?ZvT„ SC ° r rCPOrtS has h"" nnthus.as.ic, inditing that 
15 movln g in the right direction 

often mthibZZ Ut ‘ hty Snd rclevance of 1,5 reports, NAEP consults 
ZTseZe the n " *° d «ermme wha, reports uould 

n^Zem oMhe Nm ,0 " f'r? F ° r » Auji of 1973, 

math exercises, suggZd d,f”c r 'f' B ° f Mathcma,,cs eiuuumed the 
speculated al>niif 1 ^ ombing results for greater utility, and 

so P Z b?NCTM n ^ SOm ' ° f Wh,Ch «>" ,d be wnllen by NAEP, 
.n,eZeu™r a P H , W “ aa NC ™ Proposal , o wnte 

nating their effort wnh ZnaEP renZ Ma ”!' ma,,cs assessment, coord, 
several different audirnr porting effort m order to ensure that 

indeed, wnm a seZ o, „ a i e T “ t"'™' ™ys The NCTM did, 

implications published mdmWn T \ Z Ma,h ' ma,,cs assessment and its 
of elementary classroom teach ‘ taAa ‘ 3 JoumaI that reaches thousands 
focus of these articles has h. sapcrv ‘ s °r s and teacher educators The 
curriculum and pedagomcal e " ° n i!"** appl,callon ° f specific results to 

pons on achievemem m^uch areaTax^co Na,, °" a ' *“““ P“ b ““ " 

only do the reports Am I computation and consumer math Not 

ewers, ftey norzjr ’’T"'" 8 '* ° ' S * Ud '"“ *' h ° amve a, correct an 
percentages working out^ ° h . StUdentS makin S particular kinds of errors and 

and computationaUtudiesZn Z H "‘T ^ Th ~ ™ aaal >~ 

and curriculum reform NCTM '"** bcar,n S u P° n teaching techniques 

pctence ,n probabZ 7Z17T “ a ™<« ab -‘ — 

concepts in problem Lh,n S calcar ^ ^ ° f 

solving calculating area, and other topics for pubhea 



GOALS OF THE ASSESSMENT AND PROGRESS 


215 


tion in Mathematics Teacher, a journal aimed primarily at secondary school 
teachers In addition, some NCTM writers have drafted a manuscript detail 
ing the Mathematics assessment’s implications for research All these reports 
and articles- — plus the mathematics overview, a report about math m the 
inner city, complete technical documentation reports, the usual newspaper 
coverage, presentations at professional meetings, short “fact sheets’ for legis 
lators, and an intensified dissemination effort — will ensure wide coverage for 
the results and greater impact of NAEP upon the educational community 
and the public at large 


GOAL 5- TO ENCOURAGE OTHERS 
TO APPLY NAEP DATA 
TO THE SOLUTION 
OF EDUCATIONAL PROBLEMS 


The planners of a national assessment assumed that if such a 
ered important information, people would naturally sec it ou a 
This has not proven to be true, apparently because .here are so | many ■ d,f ^ 
ent kinds of information about education availab e I at “ ' h 

no, know where to star, The various programs 
information and coordinate demands 

and until they do it is necessary for projec become more mean 

about dissemination of results Aware that its ^ and , ra „]a,ed 

ingful after they are studied by learning ar fied efforts to help 

into action of some sort, the Assessment staff has intern,,, 
people follow through on the r«uhs ^ fcss , ona l organ, ra 

To date, NAEP has contracted with sever ^ m ,rrpreme 

tions and worked closely Scienc r Teachers Association 

studies of Assessment results T . and speculated about the 

studied the results of the first Science The National Council for 

implications of the results for classroom teac tcnla «ve expectan 

the Social Slud.es examined .he reporl, and devolcd 

cy levels for aeh.evement, “JT^N^onal Assessment and .B finding 

=ss== SSssssssss- 

tively working with 



216 RESPONSE OF THE NA TIONAL ASSESSMENT 


matics assessment results as was noted in the discussion of Goal 4 And the 
Music Educators National Conference has worked closely with NAEP in the 
preparation and dissemination of its Music reports, one of which—/! Ptrsptc 
live on the First Music Assessment — presents the interpretive observations of 
prominent music educators 

In an effort to encourage somewhat different but equally important groups 
to examine assessment data, NAEP has contacted such groups as the Associa 
tion of American Publishers, the Council of Great City Schools, the Ameri- 
can Association of School Administrators, the ERIC Clearinghouse for 
Teacher Education, the American Bar Association Special Committee on 
Youth and Citizenship Education, the U S Chamber of Commerce, and 
other groups with a stake m the quality and progress of American education 
nterpretive reports written by members of professional educational or 
gamzations should not be the only ones the Assessment fosters In the future, 
e sta plans to experiment with alternative approaches, such as the com 
missioning of nationally known educational leaders to write articles or pa 
pers, the establishment of regular invitational workshops for the generation of 
interpretations, or preparation of an “interpretation” manual or kit that 
W °uM assist anyone who wanted to interpret results in some way 

en reassessment of a particular area reveals significant changes in 
, i ^ * cvement » t * ie Assessment reports immediately to organizations responsi 
e or curriculum and instruction in that area This takes the form of a letter 
executive summary signed by the Executive Director of the Education 
ommission of the States and addressed to the leaders of such organizations 
as the National Science Foundation, the U S Office of Education, the Amen 
can Association for the Advancement of Science, and so on 

creased quality of exercises, greater sophistication in analysis technique, 
nvo vement of future audiences in the development of assessment matenals 
p s, and expansion of both educational and noneducational contacts — 
ese m^sures have already greatly accelerated interest in interpretation of 
Gre IS ever y reason to believe that interest in this important last 
step will continue to grow over the years 


OAL 6 TO ASSIST THOSE WHO WISH TO APPLY 
NATIONAL ASSESSMENT TECHNOLOGY 
AT STATE AND LOCAL LEVELS 

An extensive outline of Nauonal Assessment’s efforts to meet this goal 



GOALS OF THE ASSESSMENT AND PROGRESS 


217 


would be too lengthy /or this paper We can do no better for the moment 
than to summarize in the following sections the contents of ECS Report No 
48, National Assessment Achievements which outlines some of the major activities 


Aiding State Efforts 

NAEP has provided consulting services technical assistance assessment 
materials and/or data to thirty seven states interested in developing their 
own state education evaluation programs The nature of the service the proj 
ect provides ranges from evploratory and planning sessions with state educa 
tion officials governors and legislators to supplying special materials for sta 

In addition to assistance to individual state education agencies 
annually sponsors a series of workshops on assessment met ° s ™ 
state officials an overview of the techniques and materia s o a „ 

sessment pioneered by the National Assessment The op ^ [ldd 
state officials the opportunity to share eapetienem ^ ^ ^ 

To help states develop greater expertise ^ Advancemenl „f 

forming a new organization— ihe National ° prob i ems on a 

Educational Assessment — which explores mu , representatives 

continuing basis Charter members of ^ T ““ 

from California Florida New Jersey New methods of 

Some states need no more than ^"^f data and 

large scale assessment Others find t at e tQ /lt t heir needs 

services of the project can be economica y _ jn rea dmg was able to 

Connecticut following its first statewide assess ^ , cve h of per 

make direct comparisons between state r g . more effective use o 

formance Connecticut used this m orm * evaluation and planning e 

state education resources One outcome Connecticut began using 

lor, was a new stress on urban , he fall of 1974 

NAEP exercises in a statewide sc,en p rogr ess was conducted wit 

The Maine Assessment of *---££ - ? blv wlw 

materials from several learning performed convk ^ 

other children of the same a g S Broadcasting ^ cnnch 

this educational inequity ' tt fevision programs scried 

for ESAA funds to -W- £ nco Am er,can ™ . 

the educational oppv*”' such , pros ram for — 
a *250 000 gran. » conduct 



218 RESPONSE OF THE NA TIONAL ASSESSMENT 


Iowa used NAEP objectives and exercises in 1971 and 1972 to assess a 
statewide sample of students in three academic areas — science, reading, and 
literature The Iowa assessment was designed to find out how well state edu- 
cation objectives were being met Iowa also uses NAEP materials in its con- 
tinuing assessment services to local school districts The state helps local 
school officials tailor Assessment methods to local program evaluation needs 


NAEP and Local School Districts 


Since its inception, NAEP has received hundreds of requests from local 
school districts that wish to take advantage of the findings of the project and 
the methods it has developed m their own evaluation and curriculum reform 
efforts Many districts have adopted NAEP materials 

One of the most thorough applications of the NAEP approach in a local 
district is the student writing demonstration project of Montgomery County, 
, public schools School officials in this suburban Washington area were 
interested in comparing the writing skills of their thirteen and seventeen 
y “!° ° ther suburban children By using NAEP materials and drawing 
°n J $ assistance, they were able to carry out a distnetwide mini assess 
ment of writing that revealed their youngsters were performing above nation 
al suburban levels in all but a few instances 

At this point, at least fifty school districts that are either participating in a 
state assessment effort or have demonstrated the staff and funding capability 
° Ut an e ^ ectIve P ro £ rarn have received direct assistance from the 
staff s mce the NAEP staff is small, assistance to other districts has 
cessan y en limited to providing materials and written monographs 
about the project’s methods, results, and design 

,S a br,e ^ description of four “model” school districts that demon 
strate the kinds of .mpact the project has on local school programs 


an Bernardino, Ca/domia In 1972, the San Bernardino school system un 
c™°nnn a maJ ° r CValua,,on ° f ■« curriculum offerings and goals The 
nhT I com P ared loc a! curriculum objectives with the educational 

cd rn ped '° r NAEP * P aads °' Molars, educators, and , meres. 

C was a total revision of local curriculum offerings and a 
tvmral ment ° educational goals The San Bernardino effort illustrates the 
ypical approach adaptanon of NAEP educational objectives to suit local 
salues and needs ralher than wholesale adoption 

Lmco/n, Nebraska Lincoln school off, cal, recently asked themselves What 
should we be teaching 3 I, was an altemp, ,o take a fresh hard look a, the 
ucation nee s o their students When they found their answers, they 



GOALS OF THE ASSESSMENT AND PROGRESS 


219 


looked to NAEP for help in finding out how well they were achieving those 
educational goals. Although the local school district developed its own an- 
swers to the critical question of what should be taught, they found that a 
number of NAEP exercises reflected local district objectives By selecting 
those exercises to use in their local evaluation program, Lincoln school offi 
cials were able to compare local student performance with national perfor 
mance levels. Lincoln officials also took note of an added bonus Because the 
district was able to duplicate the exercises and methods used by NAE , t ey 
saved both time and money . 

The evaluation program showed that Lincoln students overa 
ment is superior or equal to national levels with a few speci ic exc 
And i, is ,o these few exceptions that Lincoln teachers are now giving added 
emphasis in the instructional program 
Shawnee Mission, Kansas The Shawnee Mission 

innovation but wants to make sure that m "° vall °" ^* tert | lscl phnary curricu- 
achievement. After school officials launc ed an ^ ^ (rom J\J A EP m 
lum project on American studies m • jncorporat , ng social studies 

developing means of evaluating its s ^ hopes 10 gaug e the com- 

exercises used in the national surveys, nnroa ch 

parative effectiveness of Iheir new curriculum approach 

Milwaukee, Wisconsin Milwaukee educanon 

well students had mastered science subjec ach ,evement in 1971 

Science exercise, in a systemwide evaluation of scum, ^ „ p . 

Milwaukee is using the “ „%den K ' setenee aehievemem 

proaches that will be effective . planning or are using .he NAEP 
Other school programs include 

approach or materials i 

JellersonCoumpPubhcSc^." 

Atlanta Public S ^ Washington 

Bellingham Public Schoo^ 

Cheyenne Mounla.n Schoo Ncw York 

Herricks, Long Wand, Pu^'^ 

Monterey Pemu VVpsc ons,n 

Mem Dependents Schools 

Oveiseas D 'P' Hawaii 

Kamehameh Ma ryland 

Gre”ley° PobhcSchoo's.Cn'c”'"' 



220 RESPONSE OF THE NA TIONAL ASSESSMENT 


Bloomington Public Schools, Minnesota 
Fairfax Public Schools, California 
Rockville Centre, New York 
Philadelphia Public Schools, Pennsylvania 
Chicago Public Schools, Illinois 
Columbia Public Schools, Missouri 

GOAL 7 TO CONTINUE TO DEVELOP, TEST, 
AND REFINE THE TECHNOLOGIES NECESSARY 
FOR GATHERING AND ANALYZING NATIONAL 
ASSESSMENT ACHIEVEMENT DATA, AND TO 
CONTINUE RESEARCH STUDIES NECESSARY FOR 
THE RESOLUTION OF EXISTING PROBLEMS 


Simply put this goal affirms the Assessment’s commitment to improving 
itself in every way possible No critics of the project are more demanding than 
the National Assessment staff members themselves Some of their plans for 
improvement have already been mentioned earlier in some detail Briefly, 
they include 


Continued refinement of the exercises and objectives development 
processes with emphasis upon increasingly effective use of weight 
mg creation of stable ‘meto objectives ” employment of Blooms 
axonomy in exercise pool evaluation increasing the number of re 
ws y a actor of four, and more efficient use of staff resources 
Continued investigation of the validity of NAEP exercises and the 
reliability of NAEP scoring techniques 
ontinued study of the sources of systematic nonsampling errors in 
asessment data Since systematic bias can crop up anywhere in the 

S1S ” P™ 0 ^ packaging of exercises to types of analy 

Deve!nnmr! ,r T mo i nitonn g ° * itself is a major NAEP activity 

eeneratimr ° ,° sam P^ , J 1 g packaging, and analysis techniques for 
generating interaction data 

cient < and 1 m"i t V 0f v!l le natlonal tr y out system to make it more effi 
cient and make better use of try-out data 

cations^ ex P* oratlon °* sampling design changes and their imph 

a ™<j mobhTme w NAEP s capac,t y to respond quickly to new ideas 
R=rr ith ° Ut com P rom,s,n g ns primary goals 
sist the Assessmrm^ n ^? emCn * Informat,on program designed to as 
In a eff,C,cnc V and effectivene^ 

its study of such n thmp n " roUS r "' arCh pr ° Sram ,haI cont,nuc 



GOALS OF THE ASSESSMENT AND PROGRESS 


221 


M 

(0 

M 


0) 

fe) 

(*) 

(0 

w 

(*) 

(0 


The influence of exercise format upon results 

The problems that hinder direct relationships between exercis 

es and objectives 

The influence of package organization upon results 
The presence or absence of a fatigue factor in results 
The advantages and liabilities of various long range sampling 
plans , tU 

Determination of the effects of refusals and nonresponses both 
by individuals and large districts or cities 


Alternative incentive strategies 

Sources of between and within scorer variability and gener 
problems with scorer bias aTeas 

Alternative ways of aggregating data within g 

across learning areas and across years 

Improved data adjustments „ nrn o C hes to data analy 

Exploration of possible multivariate pp 

Alternative reporting and dissemination strategies based on 
studies of user needs and more 


It ,s important to note that all th,s v 'S orom ^^‘Labsnc^pltal.ons or 

ing the Assessment in radical ways in respon ljnpr0V e the Assessment s 

a new definition of its purpose These are e mjsslon Naturally as the 

effectiveness in carrying out its origin* co j[s e /j ic ,ency it will remain 
organization refines its procedures an i t hat it can or shou 

alert to developments in measuremen 

respond or contribute to / .he accomplishments and asp 

This concludes a very brief **#££*, it m an ambitious undertak 
„ons of the National Assessment ‘ “1m “ w '" ba '' * '“"" S 

mg tf „ achieves us goals ,n 
positive impact upon American e 



4 


CONCLUSION: 
COMMENTS ON the 
NATIONAL ASSESSMENT, 

1974 


SOME COMMENTS ON “RECIPES, WRAPPERS, 
REASONING AND RATC” 


organized, good data base, offers broad research information, useful 
publication, am shanng with appropriate staff member ” 

Dr L Harlan Ford 
Assistant Commissioner of Education 
Texas Education Agency 
Austin, Texas 


Rate’ »«; P r ° vic * ec * trough ‘Recipes, Wrappers, Reasoning and 

meaningful ,n | crestin S a °d most revealing The content is made more 
mutual imereTand ^ 

Droprarnc UC ?! S re P ortc d ln the preparation can be used in certain 

level ” y ^ StatC Dc P artinen t and implemented at the local 


222 


George W Burton 

Assistant Superintendent for Administrative Field Services 
State Department of Education 
Richmond, Virginia 



CONCLUSION COMMENTS ON THE NATIONAL ASSESSMENT 223 


“An interesting report which I intend to use with vanous groups particularly 
to extend the speculation of the panel as to the interpretation of the data t 
least the report supplies benchmarks for additional research and progra 
development " F 


Director, Division of Teacher 


Education, Indiana University 
Bloomington, Indiana 


“Thank you for your note about the »PP“™ n “ have 

sonmg and Rate.’ As you can see from the w the atten- 

already noted its attractiveness and usefulness an technical 

.ton of our members It ts much more comprehensible than 
report and should help them see the total picture Ralph C Staiger 

Execuuve Direcior, .nternaiiona, 


" Excellent — very 
useful ” 


understandable— nice job of put ' 1 S 

Professor of Education, 


the data together— very 

A Sterl Artley 
Un.vers.ty of Missouri 
' Columbia, Missouri 


“Very helpful to our 
list ” 


t ns on your mailing 

and students Please keep US on y 


Dr a Garr Cranney 
n ( Florida 




POLITICAL KNOWLEDGE 

them o'er to the 

studies division of the F ^ „ studen 

cularly found helpful D „,d I Cnmfeld 

iowledge ’ Philadelphia Bar Assorts 

Chaionan, f 



224 


RESPONSE OF THE NA TIONAL ASSESSMENT 


Young Lawyers Section 
High School Law Course Committee 

Excellent publication It will be of tremendous help in shaping the course 
we teach on ‘Contemporary Legal Problems’ at our high school ” 

Dr Charles M Wetterer 
Principal, Cold Spring Harbor High School 
Cold Spring Harbor, New York 

Provides excellent data for those of us who have the responsibility for devel 
oping a social studies curriculum ” 

Anthony J Petnllo 
Coordinator, Social Studies 
Jefferson County School District 
Lakewood, Colorado 

I had earlier sent for and received the ‘Political Knowledge ’ mono 
graph and have found it an excellent basic source for me and my writings 
and for my graduate and undergraduate students ’ 

Melvin Amoff 
Professor, Kent State University 
Kent, Ohio 

I found Political Knowledge and Attitudes, 1971-72’ a most useful and 
o[f — booklet 1 am showing it to everyone in my classes and in our 

Gail R Kirk 
Consultant, Law Focused Education 
University of Washington 
Seattle, Washington 

the CWo^ttoL^OEA^^ 1 ^ n ° wlcd f > c and Attitudes’ to participants in 

Mary Jane Turner 
Staff Associate, Social Science Education Consortium 

Boulder, Colorado 



CONCLUSION COMMENTS ON THE NATIONAL ASSESSMENT 225 


SOME COMMENTS ON “A PERSPECTIVE 
OF THE FIRST MUSIC ASSESSMENT’ 

“I have always found NAEP publications most helpful and hope that I may 
continue to receive them You are performing a very important an 
service for all of us seriously concerned about education ^ Qrnv 

Associate Director, National Humanities Faculty 
Concord, Massachusetts 

“Highly tmportant to all music educators I hope the results have been wid y 

disseminated ” Thomas Brown 

AKociate Professor of University 

Morgantown West Virginia 
, f L res ults written 

"An interesting report somewhat lc “ sh °^ ® music educators need 
up in the newsletter Certainly poinU up the fact 

to take a hard look at goals, philosophies, Sara Holroyd 

j n, rector of Choruses 

“A worthy and much needed “^oTtoc' 

ate NAEP getting the P rof “?!° n (to me, at feast) we ’ , rd „ 

groups cited in this report program can attain ig (socxa l t pohti 

lions to go before the d e„tanding music m * “ d t „ using 

particularly to the matters of und^^ biologl<aI) context, 
cal, religious, economic, P > p au | a Haack 

music ” an d Music Therap> 

a cate Professor, Music Educa ^ Kansas 

A“° c,aK Lawrence, Kansas 

errieedandeonseyedmeanmg.-— ”” 
"tnformauon was wen^^eot The copy nhet, 
selected aspects of the®, 

state supervisor of music 



226 RESPONSE OF THE NA TIONAL ASSESSMENT 


Randall L Broyles 

Assistant State Superintendent, Department of Public Instruction 

Dover, Delaware 

Impressive It looks like a study every music educator should be throughly 
familiar with 

I should like to include comments on this publication, and the succeeding 
related publications, in our official publication, MUSART, beginning with 
the fall 1974 issue If there is additional material you would like us to consid- 
er, we should be glad to receive copies— but our deadline for the next issue is 
soon r ' 

Sister Jane Mane Perrot 
xecutive Director, National Catholic Music Education Association 

Hyattsville, Maryland 

A very good beginning— provocative of further study ” 

Oscar Brand 

Vice President, Songwnters Hall of Fame 
Great Neck, New York 


SOME COMMENTS ON “SOCIAL STUDIES: AN 
OVERVIEW” 


, | intend to relay portions of report to teachers in my department 

so that weaknesses shown in report can be corrected our school ” 


Ermil Jones 

Chairman, Social Studies Department, 
Belleville High School 
Belleville, Michigan 


very helnfnl ^ mOSt ^ cne ^ IC,a ^ publications I have ever received It will be 
very helpful in assessing our program and making recommendations ” 

Ms Harriet Parrish 
Social Studies Coordinator 
Winston Salem/Forsyth County Schools 
Winston Salem, North Carolina 

I hate hnefl> read the report and find n most helpful and interesting It will 



CONCLUSION COMMENTS ON THE NA TIONAL ASSESSMENT 227 


provide us with some guidelines from which to work as we develop our curric- 
ulum and courses at the regional high school where I teach and am Depart- 
ment Coordinator.” 

Priscilla Blanchette 
Social Studies Department Coordinator 
RHAMJr Senior High School 
Hebron, Connecticut 


“Interesting results and reporting Results have profound implications 
educational planning.” Dr Lec H Sm „ h 

Department Chairman and Project 

St Louis Park High School 
Minnetonka, Minnesota 

1* * valuable realm, because t. wU, 

own curriculum development in the soci . r WI n be discussing 

with how clearly the data and conclusions are prose 

the report with our social studies supervisor ^ j Kramer 

, r.irnculum, Moon Area Schools 
Director ol C “ rn “ oraopoI , s , Pennsylvania 

’ be interesting and informs- 
"I have found 'Social Studies An ” e ' M essment efforls lhat this 

tive I am sure that as we progress wi useful ” 

type of report, ng style and content will P™' „ Mlch ael Hat, non, an 

Social Studies Specialist 

Department of Public Ins.rucuon 

State of Wisconsin Depa vr.dison, Wisconsin 


NATIONAL ASSESSM 

irman of Policy Comm'tte') 

Leber, McBride, 
inal Association o 
ards of Education 

icXgeoTEdueaUon 


ent policy committee 
Washington State University 
Pullman, Washington 


Ml, am R- Conwa, 
uon Committee 

, House of Representatives 



228 RESPONSE OF THE NATIONAL ASSESSMENT 


Dr Lyman V Ginger 
Superintendent of Public Instruction 
State Department of Education 
Frankfort, Kentucky 
Dr George Kozmetsky 
Dean, Graduate School of Business 
University of Texas 
Ms Joyce E Lewis 
Maine House of Representatives 
Education Committee 
Dr Bill Lillard 
Superintendent of Schools 
Oklahoma City, Oklahoma 
Dr Frederick Mosteller 
Chairman, Department of Statistics 
Harvard University 


Dr Julian Samora 
Professor, University of Notre Dame 
South Bend, Indiana 
Ms Eleanor P Sheppard 
Chairman, Education Committe 
Virginia House of Representatives 
Dr Stephen Wnght 
Vice President, College Entrance 
Examination Board 
Ex officio 

Dr Ralph W Tyler 
Science Research Associates 
Chicago, Illinois 


NATIONAL ASSESSMENT ANALYSIS ADVISORY 
COMMITTEE 

Dr Frederick Mosteller, Chairman since 1973 Professor of Mathematical 
Statistics, Harvard University (1970- ) 

John W Tukey, Chairman 1965-1973 Professor of Statistics, Princeton 
University (1965- ) 

Dr Robert Abelson, Professor of Psychology, Yale University (1965-1972) 
^ °97^ B "“ mSer ’ Pr ° feSSOr °' University of California at Berlre 

Dr William E Coffman, Professor of Education, University of fowa (1970- 

versity (196^1969)'’ Pr0fes ° r °‘ Ps y chot °gy and Education, Stanford Uni- 

f 1 1 1 n o i s (Tu7 3-^ a V ^ 1 Dlrector ol Na »°naJ Opinion Research Center, Chicago, 

Mifwoml'v' 1 ' 01 '' S ‘ atmiCal Adv,sor . Center for Advanced Stud, in the 
Behavioral Sciences, Palo Alto, California (1973- ) 



CONCLUSION COMMENTS ON THE NA TIONAL ASSESSMENT 229 


Dr John P Gilbert, Staff Statistician at the Harvard University Computer 
Center (1970- ) 

Dr Lyle V Jones, Professor of Psychology, L L Thurstone Psychometnc 
Laboratory, Dean of the Graduate School, University of North Carolina 
(1965- ) 

Dr Lincoln Moses, Professor of Statistics, Dean of the Graduate School, 

Stanford University (1974- ) 

Dr Ralph W Tyler, Science Research Associates Chicago Illinois (1965- 

1969) 



SELECTED BIBLIOGRAPHY 


. _ . Theorv.” Evaluation Comment, 

Alkrn, M C "The Development of E "'“ a t ,” l programs, Los Angeles, 
Center lor the Study ol Evaluanon of Instructs 
October 1969 

Anderson, Stanford (ed ) Plannws for DamV 

MIT ' 1968) D .. E Motr InfimatmDeasin Syttms m 

Andrew, Gary M, and Ronald 

Education (Itasca, III Peacock, ) 0 f Rea] Assessment 

Bacher, Franco, se "Some 3 » ,"** 

(Doctmology)," .□ (London Evan, Brothers, 969, 

Petrion* o/ Education 1969— trantbridge, Mass M I T , 966) 

Bauer, Raymond A Social Indicates ( ^ ^ a Social Report, The Pubh 
Bell, Daniel, and Mancur Olson, 

Interest, vol 15, Spring 1969 ^ ^ (Boston Beacon . 

C cc i ) Toward the Year „ 

tcss' 

Bloom, Benjamin S 231 



232 


SELECTED BIBLIOGRAPHY 


the SlUdy ° f Eva!uat,0n ol I>»tnict.onaI Programs, Los Angeles, 
Stability and Change m Human Characteristics (New York Wiley, 1964) 


, 7eieeri/ a Theory of Testing Which Includes Measurement — Evaluation — 

t ‘ ° C ^ 10nal R fP° rt No 9. Center lor the Study of Evaluahon of 
Instructional Programs, Los Angeles, 1968 

- and D R. Krathwohl Taxonomy of Educational Objectives, Handbook 1, 
•ognitive Domain fNew York losci 


_ “ numiuioi i axonomy of Educational 

lhe Cognitive Domain (New York McKay, 1956) 

Creen 7 „ ^ M anagement of Innovation” in William T 

Greenwood fed V n,r,c, nn -re. rr „ _ 


rr^t^Td/ r, Management of Innovation” in William T 

South-Western Publfh^ige^ 5 ' Sj,!km! ( Clnc,nnaU 

Comic Ti i ^ Policy and Strategic Action. Text, Cases, and Management 

Came (Englewood Cliffs, N J Prentice-Hall, 1970) 

2“’ j ^ “ ImpaCt of Information on Decision Making” m Alfred 
Sice Hall 197 oS !/ ‘’™'" W S °' BaUM " M ° kag Cliffs, NJ 


Teachers^College, rf n Interpretation and Appraisal (New York 


I Rmnrt^Tn ^ Unc il *° r ^ ucaUon Children and Their Primary Schools, %oI 
Office* 0 1967) ReSearch and Surve y s (London Her Majesty’s Stationery 


P ° 1 ‘ t ' cs and Research Evaluation of Social Action 
Programs, Review of Educational Research, Apnl 1970 

Univrr^inT^-^u 7 ^ 271 ^ i” ^ ucatlonL Reflections on Supply and Demand, Harvard 
university, Cambridge, Mass, 1969 (Mimeographed ) 


^mm^nminf^tS' < Wash “®’° n ' D C U S 


and'DaviH and Validity of Examinations” in Joseph Lauuerys 

(London Evans Brothers,' 1^69)^°^ JW< ”* °f EJuaIl,m 1969-Exam, nations 


Deutsch, Karl W The Nerves of Government (New York Free Press, 1968) 

Donalrl A /"» i r 


1967) ' ^ Management, Information and Systems (New York Pergamon, 


William T ' < ** E * ie I m pIcmentation of a Decision,” in 

(Cincinnati n , rec " v '°™ (cd ), Decision Theory and Information Systems 
tumannau, Oh.o South Western Publishing, 1969) 

^ 97 0 StateWldC EvaIuatlon — ' What Are the Priorities’ Phi Delta 



SELECTED BIBLIOGRAPHY 


233 


Elam, Stanley, and Gordon J Swanson (eds ) Educational Planning in the United 
States (Itasca, 111 Peacock, 1969) 

Etzioni, Amitai The Actue Society (New York Free Press, 1966) 

Flanagan, John C , et al The American High School Student , Final Report, 

University of Pittsburgh, Project TALENT Office, 1964 

Habermas, Jurgen Knowledge and Human Interests (Boston Beacon Press, 

1971) 

Husen, Torsten “International Impact of Evaluation, ^ationaf 

Education, Part II, Educational Evaluation New Roles, New ea , 

Society for the Study of Education, Chicago, 1968 

et al International Study of Achmementin Mathmtus Ph use 1, A 

Comparison of 12 Countries (New York Wiley, 1 ) 

Ijiri, Yuri, Robert K Jaedicke, and Kenneth ® ^^Alfred Rappaport 
Accounting Alternatives on Management e . p re ntice-Ha!i, 

(ed ), hfoLt.cn fat Decision Mahng (Englewood Cltfh, NJ 

^ ii “Ttip Science and Politics of 

Katzman, Martin T , and Ronald S ^ n Celkg , Rtm i, vol 71, May 1970 

National Educational Assessment, _ Ar Exploratory 

Kravetz, Nathan The Evaluation °f J planning, 1970) 

Study (Pans International Inst,. me lo (Ch.cago The 

Kuhn, Thomas S The Structure of Scent,, 

University of Chicago Press, ) Selected Colleges and 

Ladd, Dwight R ^ ^ 

Universities (New York McLraw- j Par , ol Evaluation,' 

Major, Joseph L - “TS"- W 

Ohm Departn^nt^^Education, c 0 i u mbus, 1 

Essen, State Superintendent oMh. ^ Fra „cneo Chandler, 1969, 

Mo rf ' R W^% (Berkeley, Call! Un.vera.ry ol 

S„ e r r n.a S, pt“l9«) ' ^ Usc and Misuse ol = 

Rubin, Louis J 


g3HS and Wilbert E Moore — ^ ^ 

Sheldon, B^J^.968, 

(New York R° 



234 


SELECTED BIBLIOGRAPHY 


Skagcr, Rodney W ‘Cognitive Skills A Consideration in Evaluating 
Instructional Effects, Evaluation Comment, vol 1, Center for the Study of 
Evaluation of Instruction Programs, Los Angeles, January 1968 

“Objective Based Evaluation Macro-Evaluation,” Evaluation 


r , i J V, — *-»«»uauuu macro-evaluation, Evaluation 

ommen , vo 2, Center for the Study of Evaluation of Instructional Programs, 

Los Angeles, June 1970 6 


Thurston, Philtp H “Who Should Control Information Systems?” in 
Ktchard A Kaimann and Robert W Marker (eds ), Educational Data 
Processing New Dimensions and Prospects (Boston Houghton Mifflin, 1967) 


Wilensky Harold L Organizational Intelligence Knowledge and Polity in 
Government and Industry (New York Basic Books, 1967) 


Williams Lawrence K “The Human Side of a Systems Change," in 
Kicnard A Kaimann and Robert K Marker (eds), Educational Data 
Processing New Dimensions and Prospects (Boston Houghton Mifflin, 1967) 


INDEX 


Ahmann, J Stanley, 175, 197, 206n 
Allen, James, II 
Anderson, Stanford, 183n 
Andrews, Gary M . 187 
Attitudes, testing of, 27-34 


Background variables, 107 131 

description vs explanation, 

148-160 .j, 

proposed analysis of. 

( See also Input factors) 

Bailey, Stephen, I82n 
Barnes, Melvin W . JGn 
Bauer, Raymond, 180n 
Bell, Daniel, 178-180 

Bell, Max S, 21 2n 

Benne, Kenneth, 19* « 

Benms, Warren, 19 j* 

Berdie, Frances S ,g2n,183n 

Bernstein, Richard J ■ j g3 

Biderman, Albert 

Blam, Stanley. 105” g3 21 0 

Bloom Benjamin 5- 

Bowles, Frank, 9 

Snm, 189-190 

Broom, H • 

Brown, R°S er ’ 


Bruner, Jerome. 58, 185 
Bruno. Nancy L , 167» 


unpbell, Paul B . J67n 
yPE (Commiuee on Assessing 
Progress of Education). 17 

p- 

m em.56»,65. 16- 

rnegm Corpora, ion., n. 

16 - 17.22 

T , William, II f 

, sal explanation, absence . 

148-160 
n , Robert, 191 « 

,msk>. Noam. 5 34 _ 3 6.57. 

tenship. assessment of. 3^ 

65 -- 79 

icnship exercises. » <»• 

88-93 F 195 906 
man. »'<-/• l9j ' 

man. James. 

man Reporl. -X 
on. John. 11.15-16 

R C. 90 



236 


INDEX 


Criterion referencing 
and NAEP exercises, 99— 106 
theoretical issues of, 93-98 
Cronbach, Lee J , 9, 27, 115 


Dartmouth Conference, 51-52 
Davis, James, 195 
Davison, W Phillips, 188n 
Downs, Anthony, 191 n 
Dyer, Henry, 9 


ECAPE {Exploratory Committee on 
Assessing the Progress of Educa- 
tion), 16-17, 20, 21, 28, 43 
role of, 36—42 
and subject-matter objectives, 
43-66 

Education in the United States 
evaluation of, xi— xm 
goals of, 21-22 
difficulties defining, 21 —22, 
37-42, 184-186 
inequities of, 22 — 23 
Education Commission of the States 
(ECS), 18 

ElashofT, Janet, 195 
Essex, Martin W , 188 
Evaluation, 3—4 
and action, 186—192 
of education, xi— xm 
Exercise packages, NAEP s utility of 
134-139 

Exercises, NAEP criteria for, 67 
development process, 68-71 
difficulty levels. 67, 69, 100-104 
release policy, 71, 144-148 
technical problems, 71—93 


Fejerabend, Paul, 183 
Hnle>. Carmen J . 67n 
Firman. William, 9 
Fischer. John, 9 


Fishman, Joshua, 188n 
Flanagan, John, 9, 25, 26, 138, 174 
on goals of education, 21—23, 
62-63 

Flynt, Ralph, 9, 115-116 
Ford Foundation, 9, 17 
Freemar, Howard E , 190n 
French, John R P,Jr, 190n 


Gardner, John, vn, 8, 9, 22, 25 
Garvin, Alfred, 97-98, 100 
Gilbert, John P, 159, 195 
memo on inputs, 125—130 
Glaser, Robert, 94-95, 98 
Golden, William, 1 1 
Goodlad, John, 165 
Goshn, David, 9, 12 n, 26n 
Grant, Gerald, I08n 
Greene, Leroy, 18—19 
Gross, Bertram M , I80n 
Gross, Neal, 188n 
Guilford, Dorothy, 66 


Hammond, Phillip E , 131n 
Hawes, Gene R , 57n 
Hazlett, James, 175 
Heilbroner, Robert, 179n 
Higgins, Martin J , 43n 
Hirst, Paul H , 60 
Holland, John, 9, 27-28 
Howe, Harold, 1 1 
Husek, T R,96n 
Hyman, Martin D , 188n 


Ijin, Yuji, 189 
Input factors 

Gilbert-Mosteller memo on, 
125-130 

Withey memo on. 116—119 


Jaedicke. Robert k . 189 



INDEX 


237 


Jartlim, Anne, I88n 
Johnson, Paul F , IGn 
Johnson, Tobe, 93 
Josephs, Devereaux C , IGn 
Justus, Hope, 150—153 


Kahn, Robert. 191 n 
Katz, Daniel, 191 n 
Katzman, Martin, 108, 119—120, 123 
Keppel, Francis, mi, 8, 9, 22 — 24, 1 15, 
123, 151 

Kerr, John F , 46n 
Knight, Kenneth £.189 
Knowledge, NAEP's definition of, 

20-42 

Krathwohl, D R.210 


Ladd, Dwight R , I90n 
Larsen, Roy E , 16n 
Larson, Robert, 206 n 


NAEP (National Assessment of Educa- 
tional Progress) 
achievements of, 161 — 169 


budget of, 2 
evaluation of, 2, 5 — 7 
expectations for, 14—15 
guidelines for, 10 — 12 
history of initial objectives, 9-15 
organizational development, 

8 — 9, 11, 15-19 


limits of, 161 — 1 72 
objectives of, vn, x-xiu, 4. » ' 
positive comments on, 
reports of (see Reports) 
response by on choices. 198 - 204 
on goals, 196-197 .205-221 
on utilization. 2, 8” 22 [ l 
as social indicator. 184- II 86 
(Sre nfo Exercises, Testing, etc ) 
daeel, Ernest, I82n 
form-referencing, theoretical issues 

of. 93-98 


Lazarsfeld, Paul, 188n 
Literature objectives, 56n, 72- 

McBnde, Katherine E , 16« 
Mannheim, Karl, 190n 
Marrow, Alfred, I90n 
Mastie, Marjorie M . I 70 " 
Meade, Edward, 11 
Meisels, Samuel, 44« 

Merrill, Richard J , 76 
Menvin, Jack C . 43n 
Mmear, Leon, 1 1 
Moir, Ronald E . 

Montesi, Richard, 

Monson, Eltmg E . 

Morrisett, Lloyd, 9 
Moses, Lincoln, 195 
Mosher, Edith. g , 95 

Hosteller. Frederick. 15^ ^ 
memo on ,n P ut ^ j62 
Music objectives, 5 ^ 
conflict over, 9* 


^"rrNAEP.obj^oO 

within subject areas, 43-66 

alternative approaches 61-63 

contractor involvement tm4(^ 4 

difficulties with general. 

NAEP, 56-61 

formulation process. 45-51. 

55—56 .. gg 

measuring changes in. 64 —66 
ro'e of. 85—66 
Olson, Mancur. 1™ 

Piaget. Jean. IM 
pifer. Alan, x, 9 
pjowden. Lad> Bridget. 
piowden Rep° rt - vaFP. 

r °'"H-iSjM"rii-' 2 »-': 3 -'- M 

Popham. James. 91" >96". 



238 


INDEX 


Purposes, NAEP (see NAEP. objectiv es 
of) 


Rab3sseire, Henn, 179 n 
Rappaport, Alfred, 189n 
Reading exercises, 80—85, 100—103 
Reading objectives, 56n , 162 
Recommendations, general, 172—177 
Rein, Martin, 182n 
Reinert, Paul C , 16n 
Reports, NAEP. 140-160 
description vs explanation, 
148-160 

release poliaes, 144—148 
utility of, 140— 143 
Rickover, Hyman G , 13, 23, 58, 166 
Rosen, Ronald, 108, 119-120, 123 
Rossi, Peter, 9 


Sampling design, 132— 133 
and NAEP’s utility, 134-139 
Schabacker, William H , 167n 
Science exercises, 74 — 77, 85 — 88, 90 
Science objectives, 72, 100 
Seeley, David, 9 
Sewell, William H , 188n 
Sirotnik, K , 96n 
Skager, Rodney , 97 
Smith, Philip, 165 
Smith, Warren, 180n 
Smythe, Mabel, I6n,36— 40 
Social indicators 

in education. 184 — 186 
in general, 3—4, 178—184 
Social Studies objectives , 56 
conflict over, 53 — 54 
Stake, Robert, 57 
Stoddard, George, 1 1 
Subject areas, 20—42 
consequences of, 26 — 27 
defining objectives of, 37 — 42 
objectives within (see Objectives, 
within subject areas) 

(See a/to specific subject area) 
Swanson, Gordon, 165n 


Testing 

for attitudes, 27—34 
criterion-referenced, 93—106 
NAEP’s influence on, 13— 14 
Thorndike, Robert, 1 1 
Thurston, Philip H , !90n 
Toan, Arthur B , 188, 190n 
Tufts, Edith, I89n 
Tukey.John.9, 10.12,41.108-109, 
115-116, 119-120, 123-124, 
148-149, 195, 206 
Tyler, Ralph W,vii,8,9, 12,16,20, 
22,25, 28,61, 108-109, 
115-116, 119-120, 123, 
148-149 

on deciding what to test, 37—42 
on guidelines for NAEP, 10, 19 
on measuring intangibles, 32—33 
on subject areas, 35—36 
on USOE involvement, 48 


United States Office of Education 
(USOE). 16, 48 
Uses of the Assessment 
claims for, 169—171 
planning for, 172 


Vargas, J S , 96 


Weiss, Robert S , 182n 
Wilensky, Harold, 187-188, l91n 
Wilson, Logan, 1 1 
Winch, Peter, 182n 
Withey, Stephen B , 16n, 36 
memo on inputs. 1 16— 1 19 
Witmer, Helen, I89n 
Wolfe, John K , I72n 
Womer, Frank, 5n, 104, 170n 
Writing exercises, 76 
Writing objectives, conflict over, 
51-53 

Wyatt, Robert. 1 1 
Zalezmk. Abraham, I88n 



